[jira] [Created] (LUCENE-9107) CommonsTermsQuery with huge no. of terms slower with top-k scoring
Tommaso Teofili created LUCENE-9107: --- Summary: CommonsTermsQuery with huge no. of terms slower with top-k scoring Key: LUCENE-9107 URL: https://issues.apache.org/jira/browse/LUCENE-9107 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 8.3 Reporter: Tommaso Teofili In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9107) CommonsTermsQuery with huge no. of terms slower with top-k scoring
[ https://issues.apache.org/jira/browse/LUCENE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated LUCENE-9107: Description: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [2] : https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 was: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 > CommonsTermsQuery with huge no. of terms slower with top-k scoring > -- > > Key: LUCENE-9107 > URL: https://issues.apache.org/jira/browse/LUCENE-9107 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.3 >Reporter: Tommaso Teofili >Priority: Major > > In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots > of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low > frequency terms, the query, although big, finishes in around 2-300ms with > Lucene 7.6.0. > However, when upgrading the code to Lucene 8.x, the query runs in 2-3s > instead. > After digging a bit into it it seems that the regression in speed comes from > the fact that top-k scoring introduced by default in version 8 is causing > that, not sure "where" exactly in the code though. > When switching back to complete hit scoring [3], the speed goes back to the > initial 2-300ms also in Lucene 8.3.x. > I am looking into why this is happening and if it is only concerning > {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. > [1] : > https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java > [2] : > https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java > [3] : > https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9107) CommonsTermsQuery with huge no. of terms slower with top-k scoring
[ https://issues.apache.org/jira/browse/LUCENE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated LUCENE-9107: Description: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 was: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 > CommonsTermsQuery with huge no. of terms slower with top-k scoring > -- > > Key: LUCENE-9107 > URL: https://issues.apache.org/jira/browse/LUCENE-9107 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.3 >Reporter: Tommaso Teofili >Priority: Major > > In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots > of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low > frequency terms, the query, although big, finishes in around 2-300ms with > Lucene 7.6.0. > However, when upgrading the code to Lucene 8.x, the query runs in 2-3s > instead. > After digging a bit into it it seems that the regression in speed comes from > the fact that top-k scoring introduced by default in version 8 is causing > that, not sure "where" exactly in the code though. > When switching back to complete hit scoring [3], the speed goes back to the > initial 2-300ms also in Lucene 8.3.x. > I am looking into why this is happening and if it is only concerning > {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. > [1] : > https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java > [3] : > https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9107) CommonsTermsQuery with huge no. of terms slower with top-k scoring
[ https://issues.apache.org/jira/browse/LUCENE-9107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tommaso Teofili updated LUCENE-9107: Description: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead [2]. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [2] : https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 was: In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low frequency terms, the query, although big, finishes in around 2-300ms with Lucene 7.6.0. However, when upgrading the code to Lucene 8.x, the query runs in 2-3s instead. After digging a bit into it it seems that the regression in speed comes from the fact that top-k scoring introduced by default in version 8 is causing that, not sure "where" exactly in the code though. When switching back to complete hit scoring [3], the speed goes back to the initial 2-300ms also in Lucene 8.3.x. I am looking into why this is happening and if it is only concerning {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. [1] : https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java [2] : https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java [3] : https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 > CommonsTermsQuery with huge no. of terms slower with top-k scoring > -- > > Key: LUCENE-9107 > URL: https://issues.apache.org/jira/browse/LUCENE-9107 > Project: Lucene - Core > Issue Type: Bug > Components: core/search >Affects Versions: 8.3 >Reporter: Tommaso Teofili >Priority: Major > > In [1] a {{CommonTermsQuery}} is used in order to perform a query with lots > of (duplicate) terms. Using a max term frequency cutoff of 0.999 for low > frequency terms, the query, although big, finishes in around 2-300ms with > Lucene 7.6.0. > However, when upgrading the code to Lucene 8.x, the query runs in 2-3s > instead [2]. > After digging a bit into it it seems that the regression in speed comes from > the fact that top-k scoring introduced by default in version 8 is causing > that, not sure "where" exactly in the code though. > When switching back to complete hit scoring [3], the speed goes back to the > initial 2-300ms also in Lucene 8.3.x. > I am looking into why this is happening and if it is only concerning > {{CommonTermsQuery}} or affecting {{BooleanQuery}} as well. > [1] : > https://github.com/tteofili/Anserini-embeddings/blob/nnsearch/src/main/java/io/anserini/embeddings/nn/fw/FakeWordsRunner.java > [2] : > https://github.com/castorini/anserini/blob/master/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java > [3] : > https://github.com/tteofili/anserini/blob/ann-paper-reproduce/src/main/java/io/anserini/analysis/vectors/ApproximateNearestNeighborEval.java#L174 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant reassigned LUCENE-9102: -- Assignee: Bruno Roustant > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] asfgit closed pull request #1103: LUCENE-9102: add maxQueryLength option to DirectSpellChecker
asfgit closed pull request #1103: LUCENE-9102: add maxQueryLength option to DirectSpellChecker URL: https://github.com/apache/lucene-solr/pull/1103 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002195#comment-17002195 ] ASF subversion and git services commented on LUCENE-9102: - Commit 45dce3431688b3e3094b02f8dc824183b055c212 in lucene-solr's branch refs/heads/master from Andy Webb [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=45dce34 ] LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. Closes #1103 > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Time Spent: 1h > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13778) Windows JDK SSL Test Failure trend: SSLException: Software caused connection abort: recv failed
[ https://issues.apache.org/jira/browse/SOLR-13778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002197#comment-17002197 ] Uwe Schindler commented on SOLR-13778: -- Hi, about the JDK error handling I opened [https://bugs.openjdk.java.net/browse/JDK-8236498] on behalf of [~dweiss]. Thanks Dawid! > Windows JDK SSL Test Failure trend: SSLException: Software caused connection > abort: recv failed > --- > > Key: SOLR-13778 > URL: https://issues.apache.org/jira/browse/SOLR-13778 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Chris M. Hostetter >Priority: Major > Attachments: RecvFailedTest.java, dumps-LegacyCloud.zip, > logs-2019-12-12-1.zip, recv-multiple-2019-12-18.zip > > > Now that Uwe's jenkins build has been correctly reporting it's build results > for my [automated > reports|http://fucit.org/solr-jenkins-reports/failure-report.html] to pick > up, I've noticed a pattern of failures that indicate a definite problem with > using SSL on Windows (even with java 11.0.4 > ) > The symptommatic stack traces all contain... > {noformat} > ... >[junit4]> Caused by: javax.net.ssl.SSLException: Software caused > connection abort: recv failed >[junit4]>at > java.base/sun.security.ssl.Alert.createSSLException(Alert.java:127) > ... >[junit4]> Caused by: java.net.SocketException: Software caused > connection abort: recv failed >[junit4]>at > java.base/java.net.SocketInputStream.socketRead0(Native Method) > ... > {noformat} > I suspect this may be related to > [https://bugs.openjdk.java.net/browse/JDK-8209333] but i have no concrete > evidence to back this up. > I'll post some details of my analysis in comments... -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002236#comment-17002236 ] ASF subversion and git services commented on LUCENE-9102: - Commit ab1dc42c63a77162a2cf4ea6985364583e07bdc5 in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ab1dc42 ] LUCENE-9102: Add maxQueryLength option to DirectSpellchecker. > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12490) Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16926538#comment-16926538 ] Mikhail Khludnev edited comment on SOLR-12490 at 12/23/19 1:32 PM: --- -I think we'd rather continue with adding yet another small cut.- {code:java} { "query" : {...}, "params":{ "childFq":[{ "#color" :"color:black" }, { "#size" : "size:L" }] }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}childFq" ] } } } } {code} -Ideas are:- * -put json as param value, parser garbles it to meaningless string, but it's still available via {{req.getJSON()}}.- * -filter string invokes new query parser which convert json param as query DSL, need to decide how to keep {{JsonQueryConverter}} counter.- -Shouldn't be a big deal. Right?- was (Author: mkhludnev): I think we'd rather continue with adding yet another small cut. {code} { "query" : {...}, "params":{ "childFq":[{ "#color" :"color:black" }, { "#size" : "size:L" }] }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}childFq" ] } } } } {code} Ideas are: * put json as param value, parser garbles it to meaningless string, but it's still available via {{req.getJSON()}}. * filter string invokes new query parser which convert json param as query DSL, need to decide how to keep {{JsonQueryConverter}} counter. Shouldn't be a big deal. Right? > Query DSL supports for further referring and exclusion in JSON facets > -- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Priority: Major > Labels: newdev > Attachments: SOLR-12490.patch, SOLR-12490.patch, > image-2019-10-21-09-37-37-118.png > > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12490) Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948013#comment-16948013 ] Mikhail Khludnev edited comment on SOLR-12490 at 12/23/19 1:32 PM: --- -Handling array as in the snippet above too cumbersome. Here's the nobrainer approach. Attaching rough cut.- {code:java} { "query" : {...}, "params":{ "color_fq":{ "#color" :"color:black" }, "size_fq": { "#size" : "size:L" } }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}color_fq", "{!json_param}size_fq" ] } } } } {code} -Opinions?- was (Author: mkhludnev): -Handling array as in the snippet above too cumbersome. Here's the nobrainer approach. Attaching rough cut.- {code:java} { "query" : {...}, "params":{ "color_fq":{ "#color" :"color:black" }, "size_fq": { "#size" : "size:L" } }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}color_fq", "{!json_param}size_fq" ] } } } } {code} Opinions? > Query DSL supports for further referring and exclusion in JSON facets > -- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Priority: Major > Labels: newdev > Attachments: SOLR-12490.patch, SOLR-12490.patch, > image-2019-10-21-09-37-37-118.png > > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12490) Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16948013#comment-16948013 ] Mikhail Khludnev edited comment on SOLR-12490 at 12/23/19 1:32 PM: --- -Handling array as in the snippet above too cumbersome. Here's the nobrainer approach. Attaching rough cut.- {code:java} { "query" : {...}, "params":{ "color_fq":{ "#color" :"color:black" }, "size_fq": { "#size" : "size:L" } }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}color_fq", "{!json_param}size_fq" ] } } } } {code} Opinions? was (Author: mkhludnev): Handling array as in the snippet above too cumbersome. Here's the nobrainer approach. Attaching rough cut. {code:java} { "query" : {...}, "params":{ "color_fq":{ "#color" :"color:black" }, "size_fq": { "#size" : "size:L" } }, "facet":{ "sku_colors_in_prods":{ "type" : "terms", "field" : "color", "domain" : { "excludeTags":["top", "color"], "filter":[ "{!json_param}color_fq", "{!json_param}size_fq" ] } } } } {code} Opinions? > Query DSL supports for further referring and exclusion in JSON facets > -- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Priority: Major > Labels: newdev > Attachments: SOLR-12490.patch, SOLR-12490.patch, > image-2019-10-21-09-37-37-118.png > > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12490) Query DSL supports for further referring and exclusion in JSON facets
[ https://issues.apache.org/jira/browse/SOLR-12490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955801#comment-16955801 ] Mikhail Khludnev edited comment on SOLR-12490 at 12/23/19 1:33 PM: --- -Renamed it \{!json} to avoid clash with nestedQP. Here how it looks now:- !image-2019-10-21-09-37-37-118.png! was (Author: mkhludnev): Renamed it \{!json} to avoid clash with nestedQP. Here how it looks now: !image-2019-10-21-09-37-37-118.png! > Query DSL supports for further referring and exclusion in JSON facets > -- > > Key: SOLR-12490 > URL: https://issues.apache.org/jira/browse/SOLR-12490 > Project: Solr > Issue Type: Improvement > Components: Facet Module, faceting >Reporter: Mikhail Khludnev >Priority: Major > Labels: newdev > Attachments: SOLR-12490.patch, SOLR-12490.patch, > image-2019-10-21-09-37-37-118.png > > > It's spin off from the > [discussion|https://issues.apache.org/jira/browse/SOLR-9685?focusedCommentId=16508720&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16508720]. > > h2. Problem > # after SOLR-9685 we can tag separate clauses in hairish queries like > {{parent}}, {{bool}} > # we can {{domain.excludeTags}} > # we are looking for child faceting with exclusions, see SOLR-9510, SOLR-8998 > > # but we can refer only separate params in {{domain.filter}}, it's not > possible to refer separate clauses > see the first comment -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1112: SOLR-14131: add maxQueryLength option
andywebb1975 opened a new pull request #1112: SOLR-14131: add maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1112 This is a work-in-progress - I'm trying to get tests working. # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 closed pull request #1107: SOLR-14131: add maxQueryLength option to DirectSolrSpellChecker
andywebb1975 closed pull request #1107: SOLR-14131: add maxQueryLength option to DirectSolrSpellChecker URL: https://github.com/apache/lucene-solr/pull/1107 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14131) Add maxQueryLength option to DirectSolrSpellchecker
[ https://issues.apache.org/jira/browse/SOLR-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Webb updated SOLR-14131: - Description: Attempting to spellcheck some long query terms can trigger org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr can be configured to not attempt to spellcheck terms over a specified length. Here's a draft PR: https://github.com/apache/lucene-solr/pull/1112 (I'm struggling writing tests, and we should update the Solr docs too.) was:Attempting to spellcheck some long query terms can trigger org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) adds a maxQueryLength option to DirectSolrSpellchecker so that Lucene/Solr can be configured to not attempt to spellcheck terms over a specified length. > Add maxQueryLength option to DirectSolrSpellchecker > --- > > Key: SOLR-14131 > URL: https://issues.apache.org/jira/browse/SOLR-14131 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Andy Webb >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This > change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) > adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr > can be configured to not attempt to spellcheck terms over a specified length. > Here's a draft PR: https://github.com/apache/lucene-solr/pull/1112 (I'm > struggling writing tests, and we should update the Solr docs too.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 closed pull request #1112: SOLR-14131: add maxQueryLength option
andywebb1975 closed pull request #1112: SOLR-14131: add maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1112 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9108) eliminate JKS keystore from solr SSL docs
Robert Muir created LUCENE-9108: --- Summary: eliminate JKS keystore from solr SSL docs Key: LUCENE-9108 URL: https://issues.apache.org/jira/browse/LUCENE-9108 Project: Lucene - Core Issue Type: Improvement Reporter: Robert Muir On the "Enabling SSL" page: https://lucene.apache.org/solr/guide/8_3/enabling-ssl.html#enabling-ssl The first step is currently to create a JKS keystore. The next step immediately converts the JKS keystore into PKCS12, so that openssl can then be used to extract key material in PEM format for use with curl. Now that PKCS12 is java's default keystore format, why not omit step 1 entirely? What am I missing? PKCS12 is a more commonly understood/standardized format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Moved] (SOLR-14141) eliminate JKS keystore from solr SSL docs
[ https://issues.apache.org/jira/browse/SOLR-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir moved LUCENE-9108 to SOLR-14141: Key: SOLR-14141 (was: LUCENE-9108) Lucene Fields: (was: New) Project: Solr (was: Lucene - Core) Security: Public > eliminate JKS keystore from solr SSL docs > - > > Key: SOLR-14141 > URL: https://issues.apache.org/jira/browse/SOLR-14141 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > > On the "Enabling SSL" page: > https://lucene.apache.org/solr/guide/8_3/enabling-ssl.html#enabling-ssl > The first step is currently to create a JKS keystore. The next step > immediately converts the JKS keystore into PKCS12, so that openssl can then > be used to extract key material in PEM format for use with curl. > Now that PKCS12 is java's default keystore format, why not omit step 1 > entirely? What am I missing? PKCS12 is a more commonly > understood/standardized format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] rmuir merged pull request #1110: SOLR-14138: enable request log via environ var
rmuir merged pull request #1110: SOLR-14138: enable request log via environ var URL: https://github.com/apache/lucene-solr/pull/1110 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002348#comment-17002348 ] ASF subversion and git services commented on SOLR-14138: Commit 1425d6cbf853a8ab8998f95b6982c065d9bac1c7 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1425d6c ] SOLR-14138: enable request log via environ var, remove deprecated jetty class usage, respect SOLR_LOGS_DIR (#1110) User can now set SOLR_REQUESTLOG_ENABLED=true to enable the jetty request log, instead of editing XML. The location of the request logs will respect SOLR_LOGS_DIR if that is set. The deprecated NCSARequestLog is no longer used, instead it uses CustomRequestLog with NCSA_FORMAT. > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002350#comment-17002350 ] ASF subversion and git services commented on SOLR-14138: Commit baeaa56fb27efe41b9c41d35a93d086b2a9d7cb4 in lucene-solr's branch refs/heads/branch_8x from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=baeaa56 ] SOLR-14138: enable request log via environ var, remove deprecated jetty class usage, respect SOLR_LOGS_DIR (#1110) User can now set SOLR_REQUESTLOG_ENABLED=true to enable the jetty request log, instead of editing XML. The location of the request logs will respect SOLR_LOGS_DIR if that is set. The deprecated NCSARequestLog is no longer used, instead it uses CustomRequestLog with NCSA_FORMAT. > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved SOLR-14138. Assignee: Robert Muir Resolution: Fixed > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-14138: --- Fix Version/s: 8.5 > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14142) Enable jetty's request log by default
Robert Muir created SOLR-14142: -- Summary: Enable jetty's request log by default Key: SOLR-14142 URL: https://issues.apache.org/jira/browse/SOLR-14142 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Robert Muir I'd like to enable the jetty request log by default. This log is now in the correct directory, it no longer uses the deprecated mechanisms (it is asynclogwriter + customformat), etc. See SOLR-14138. This log is in a standard format (NCSA) which is supported by tools out-of-box. It does not contain challenges such as java exceptions and is easy to work with. Without it enabled, solr really has insufficient logging (e.g. no IP addresses). If someone's solr gets hacked, its only fair they at least get to see who did it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 closed pull request #1098: SOLR-13190 - added maxQueryLength parameter
andywebb1975 closed pull request #1098: SOLR-13190 - added maxQueryLength parameter URL: https://github.com/apache/lucene-solr/pull/1098 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 opened a new pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113 # Description Attempting to spellcheck some long query terms can trigger org.apache.lucene.util.automaton.TooComplexToDeterminizeException. # Solution This change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr can be configured to not attempt to spellcheck terms over a specified length. # Tests A new test checks that a term is spellchecked before maxQueryLength is reduced, and not afterwards. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [x] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [x] I have added tests for my changes. - [x] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002356#comment-17002356 ] ASF subversion and git services commented on LUCENE-9102: - Commit 663bfe2d8b5b4996806d4fcf4cc09ea12be45464 in lucene-solr's branch refs/heads/master from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=663bfe2 ] LUCENE-9102: update changes.txt > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14131) Add maxQueryLength option to DirectSolrSpellchecker
[ https://issues.apache.org/jira/browse/SOLR-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Webb updated SOLR-14131: - Description: Attempting to spellcheck some long query terms can trigger org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr can be configured to not attempt to spellcheck terms over a specified length. Here's a PR: https://github.com/apache/lucene-solr/pull/1113 was: Attempting to spellcheck some long query terms can trigger org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr can be configured to not attempt to spellcheck terms over a specified length. Here's a draft PR: https://github.com/apache/lucene-solr/pull/1112 (I'm struggling writing tests, and we should update the Solr docs too.) > Add maxQueryLength option to DirectSolrSpellchecker > --- > > Key: SOLR-14131 > URL: https://issues.apache.org/jira/browse/SOLR-14131 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Andy Webb >Priority: Minor > Time Spent: 50m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This > change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) > adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr > can be configured to not attempt to spellcheck terms over a specified length. > Here's a PR: https://github.com/apache/lucene-solr/pull/1113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360930320 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: It's not clear to me what the "super" test above does. As far as I can see, the test runs a spellcheck for "super" but then uses "fob" as the index into suggestions, which will never find an entry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360930320 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: It's not clear to me what the "super" test above is for. As far as I can see, the test runs a spellcheck for "super" but then uses "fob" as the index into suggestions, which will never find an entry. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002359#comment-17002359 ] ASF subversion and git services commented on LUCENE-9102: - Commit 361bf78d899433730210bcec1b775b74cbb71664 in lucene-solr's branch refs/heads/branch_8x from Bruno Roustant [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=361bf78 ] LUCENE-9102: update changes.txt > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9102) Add maxQueryLength option to DirectSpellchecker
[ https://issues.apache.org/jira/browse/LUCENE-9102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bruno Roustant updated LUCENE-9102: --- Fix Version/s: 8.5 Resolution: Fixed Status: Resolved (was: Patch Available) > Add maxQueryLength option to DirectSpellchecker > --- > > Key: LUCENE-9102 > URL: https://issues.apache.org/jira/browse/LUCENE-9102 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/spellchecker >Reporter: Andy Webb >Assignee: Bruno Roustant >Priority: Minor > Fix For: 8.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > {{org.apache.lucene.util.automaton.TooComplexToDeterminizeException}}. This > change (previously discussed in SOLR-13190) adds a {{maxQueryLength}} option > to {{DirectSpellchecker}} so that Lucene can be configured to not attempt to > spellcheck terms over a specified length. > PR: https://github.com/apache/lucene-solr/pull/1103 > Dependent Solr issue: SOLR-14131 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14131) Add maxQueryLength option to DirectSolrSpellchecker
[ https://issues.apache.org/jira/browse/SOLR-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Webb updated SOLR-14131: - Status: Patch Available (was: Open) > Add maxQueryLength option to DirectSolrSpellchecker > --- > > Key: SOLR-14131 > URL: https://issues.apache.org/jira/browse/SOLR-14131 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Andy Webb >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This > change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) > adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr > can be configured to not attempt to spellcheck terms over a specified length. > Here's a PR: https://github.com/apache/lucene-solr/pull/1113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14131) Add maxQueryLength option to DirectSolrSpellchecker
[ https://issues.apache.org/jira/browse/SOLR-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002366#comment-17002366 ] Bruno Roustant commented on SOLR-14131: --- [~andywebb1975] you closed the PR#1112. Will you link another PR? > Add maxQueryLength option to DirectSolrSpellchecker > --- > > Key: SOLR-14131 > URL: https://issues.apache.org/jira/browse/SOLR-14131 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Andy Webb >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This > change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) > adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr > can be configured to not attempt to spellcheck terms over a specified length. > Here's a PR: https://github.com/apache/lucene-solr/pull/1113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14131) Add maxQueryLength option to DirectSolrSpellchecker
[ https://issues.apache.org/jira/browse/SOLR-14131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002368#comment-17002368 ] Andy Webb commented on SOLR-14131: -- hi Bruno - thanks for committing the Lucene change! I've linked [PR 1113|https://github.com/apache/lucene-solr/pull/1113] as the earlier one got messy - but with a colleague's help I think I've got a decent test for the change - let me know what you think. Andy > Add maxQueryLength option to DirectSolrSpellchecker > --- > > Key: SOLR-14131 > URL: https://issues.apache.org/jira/browse/SOLR-14131 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: search >Reporter: Andy Webb >Priority: Minor > Time Spent: 1h 10m > Remaining Estimate: 0h > > Attempting to spellcheck some long query terms can trigger > org.apache.lucene.util.automaton.TooComplexToDeterminizeException. This > change (previously discussed in SOLR-13190, and dependent on LUCENE-9102) > adds a maxQueryLength option to DirectSolrSpellChecker so that Lucene/Solr > can be configured to not attempt to spellcheck terms over a specified length. > Here's a PR: https://github.com/apache/lucene-solr/pull/1113 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360934384 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); Review comment: minor nit: we can get cleaner test failures by using `assertNotNull` here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360938182 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: Yes, and it asserts that there are no results. It's the negative test case for the spell checking match. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
madrob commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360934621 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); + + if (limitQueryLength) { +assertTrue("suggestions should be empty", suggestions.isEmpty()); + } else { +Map.Entry entry = suggestions.entrySet().iterator().next(); +assertTrue(entry.getKey() + " is not equal to 'another'", entry.getKey().equals("another") == true); Review comment: minor nit: use `assertEquals` here This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14142) Enable jetty's request log by default
[ https://issues.apache.org/jira/browse/SOLR-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-14142: --- Fix Version/s: master (9.0) > Enable jetty's request log by default > - > > Key: SOLR-14142 > URL: https://issues.apache.org/jira/browse/SOLR-14142 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Fix For: master (9.0) > > > I'd like to enable the jetty request log by default. > This log is now in the correct directory, it no longer uses the deprecated > mechanisms (it is asynclogwriter + customformat), etc. See SOLR-14138. > This log is in a standard format (NCSA) which is supported by tools > out-of-box. It does not contain challenges such as java exceptions and is > easy to work with. Without it enabled, solr really has insufficient logging > (e.g. no IP addresses). > If someone's solr gets hacked, its only fair they at least get to see who did > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360947790 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: Good point. Indeed the test should be fixed line 77 to use spellOpts.tokens instead and expect empty suggestions. Would you like to fix it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14142) Enable jetty's request log by default
[ https://issues.apache.org/jira/browse/SOLR-14142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-14142: --- Attachment: SOLR-14142.patch > Enable jetty's request log by default > - > > Key: SOLR-14142 > URL: https://issues.apache.org/jira/browse/SOLR-14142 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-14142.patch > > > I'd like to enable the jetty request log by default. > This log is now in the correct directory, it no longer uses the deprecated > mechanisms (it is asynclogwriter + customformat), etc. See SOLR-14138. > This log is in a standard format (NCSA) which is supported by tools > out-of-box. It does not contain challenges such as java exceptions and is > easy to work with. Without it enabled, solr really has insufficient logging > (e.g. no IP addresses). > If someone's solr gets hacked, its only fair they at least get to see who did > it. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360949962 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); + + if (limitQueryLength) { +assertTrue("suggestions should be empty", suggestions.isEmpty()); + } else { +Map.Entry entry = suggestions.entrySet().iterator().next(); Review comment: Maybe we could insert an assertFalse(suggestions.isEmpty()) otherwise the line below will throw a NoSuchElementException less nice in a test. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360949352 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); Review comment: assertNotNull? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360949293 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); Review comment: assertNotNull? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360949433 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); + + if (limitQueryLength) { +assertTrue("suggestions should be empty", suggestions.isEmpty()); + } else { +Map.Entry entry = suggestions.entrySet().iterator().next(); +assertTrue(entry.getKey() + " is not equal to 'another'", entry.getKey().equals("another") == true); Review comment: assertEquals This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360949198 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); Review comment: We should use generics here: NamedList spellchecker = new NamedList<>(); This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360950592 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: (comment race) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
bruno-roustant commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360950705 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); + + if (limitQueryLength) { +assertTrue("suggestions should be empty", suggestions.isEmpty()); + } else { +Map.Entry entry = suggestions.entrySet().iterator().next(); +assertTrue(entry.getKey() + " is not equal to 'another'", entry.getKey().equals("another") == true); Review comment: (comment race) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002388#comment-17002388 ] Jason Gerlowski commented on SOLR-13890: bq. I highly doubt the PostFilter abstraction somehow offers a perf benefit in your benchmark that cannot be achieved with TwoPhaseIterator I'm leaning on your correction a bit here as you're more familiar with the Lucene code than I am. But as I read the TPI implementation for DocValuesTermsQuery, I see one reason why a postfilter impl might be faster (other than segment-level vs top-level) The TPI "approximation" for DocValuesTermsQuery is the unfiltered doc-values structure for the field. As a result TPI {{matches()}} is going to be called on all documents that have any value at all for the field in question. Under a post-filter implementation, the bitset lookup is (potentially) called much less frequently, as we only lookup values for docs that have matched all the other (non-postfilter) query clauses. Does that make sense, or am I off-base [~dsmiley]? In either case, this is hypothetical. The real proof is in a perf experiment. I'm putting one together now to share soon. bq. Though I don't know whether the details of my test would have tripped whatever heuristics Lucene uses to turn TPI on/off. As best as I can tell from the [code|https://github.com/apache/lucene-solr/blob/174cc63bad411eace196a6c7028bdd24864fefed/lucene/sandbox/src/java/org/apache/lucene/search/DocValuesTermsQuery.java#L218], it looks like DVTQ always uses TPI processing. So there's no particular concern about ensuring that logic is triggered when I perf test. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416 ] Joel Bernstein commented on SOLR-13890: --- I dug into this pretty deeply and I believe there is large advantage to top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get as similar performance as this patch. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:09 PM: - I dug into this pretty deeply and I believe there is large advantage to the top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get similar similar performance as this patch. was (Author: joel.bernstein): I dug into this pretty deeply and I believe there is large advantage to top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get as similar performance as this patch. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002420#comment-17002420 ] Joel Bernstein commented on SOLR-13890: --- We have code somewhat similar to this patch deployed with a cross-core join that provides sub-second performance with 50,000 join terms. We will not achieve that with the terms query because 50,000 terms is too large to pass in efficiently, but the term lookups are scalable with the top level ordinal approach. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:15 PM: - I dug into this pretty deeply and I believe there is a large advantage to the top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get similar similar performance as this patch. was (Author: joel.bernstein): I dug into this pretty deeply and I believe there is large advantage to the top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get similar similar performance as this patch. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002416#comment-17002416 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:18 PM: - I dug into this pretty deeply and I believe there is a large advantage to the top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get similar performance as this patch. was (Author: joel.bernstein): I dug into this pretty deeply and I believe there is a large advantage to the top level doc values approach when there is a large number of terms. The reason is that *MultiSortedSetDocValues.lookupOrd* (in MultiDocValues) is really clever, so the overhead of doing the top level term lookup is much less than doing the segment by segment term lookups. Using the top level ordinals inside of the scorer would be possible also but seemed kind of awkward. But, in theory using top level ordinals in the scorer would get similar similar performance as this patch. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein commented on SOLR-13890: --- The other really big aspect of this is caching. Even though scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So scenarios where there is lot's of indexing going the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:27 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So scenarios where there is lot's of indexing going the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So scenarios where there is lot's of indexing going the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:28 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So scenarios where there is lot's of indexing going the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:30 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002438#comment-17002438 ] ASF subversion and git services commented on SOLR-14138: Commit 403fd05646c32981ca15637678602eb12c5239d7 in lucene-solr's branch refs/heads/master from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=403fd05 ] SOLR-14138: changes.txt > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14138) Fix commented-out RequestLog in jetty.xml to use non-deprecated class
[ https://issues.apache.org/jira/browse/SOLR-14138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002439#comment-17002439 ] ASF subversion and git services commented on SOLR-14138: Commit f1a674717a3c97784826b1c1b5fb2bb1cdc9d581 in lucene-solr's branch refs/heads/branch_8x from Robert Muir [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f1a6747 ] SOLR-14138: changes.txt > Fix commented-out RequestLog in jetty.xml to use non-deprecated class > - > > Key: SOLR-14138 > URL: https://issues.apache.org/jira/browse/SOLR-14138 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Assignee: Robert Muir >Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > Currently the jetty request logging is disabled (commented out). > But it can be useful, e.g. since it uses a standard logging format and there > are tools to analyze it by default. Also it can be used to detect some > attacks not otherwise logged anywhere else, since they don't make it to solr > servlet: requests blocked at the jetty level (invalid/malformed requests, > ones filtered by jetty IP filtering, etc). > We should switch it from the deprecated NCSARequestLog class, instead to use > the CustomRequestLog with either NCSA_FORMAT or EXTENDED_NCSA_FORMAT. > {quote} > Deprecated. > use CustomRequestLog given format string > CustomRequestLog.EXTENDED_NCSA_FORMAT with a RequestLogWriter > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 6:52 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Solr will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Which will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9109) Use Java 9+ StackWalker to implement TestSecurityManager's detection of JVM exit
Uwe Schindler created LUCENE-9109: - Summary: Use Java 9+ StackWalker to implement TestSecurityManager's detection of JVM exit Key: LUCENE-9109 URL: https://issues.apache.org/jira/browse/LUCENE-9109 Project: Lucene - Core Issue Type: Improvement Components: modules/test-framework Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: master (9.0) This is just a small improvement in Lucene/Solr master (Java 11) to detect exit of JVM in our test framework. There are other places in Lucene that use ineffective ways to inspect the stack trace. This one optimizes the implementation of TestSecurityManager#checkExit(status) to disallow all JVM exits outside of the official test runner by using StackWalker. In addition this needs no additional permissions, because we do not instruct StackWalker to fetch all crazy stuff like Class instances of stack elements. The way how this works is: Walk through stack trace: - skip all internal frames (those which come before the actual exit call) - skip all frmes with the actual exit call - limit to one more frame (the method calling System.exit()) - check if that remaining frame is on our whitelist This can only be commited to master (9.0), as it requires Java 9. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler opened a new pull request #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit
uschindler opened a new pull request #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit URL: https://github.com/apache/lucene-solr/pull/1114 This is just a small improvement in Lucene/Solr master (Java 11) to detect exit of JVM in our test framework. There are other places in Lucene that use ineffective ways to inspect the stack trace. This one optimizes the implementation of TestSecurityManager#checkExit(status) to disallow all JVM exits outside of the official test runner by using StackWalker. In addition this needs no additional permissions, because we do not instruct StackWalker to fetch all crazy stuff like Class instances of stack elements. The way how this works is: Walk through stack trace: - skip all internal frames (those which come before the actual exit call) - skip all frmes with the actual exit call - limit to one more frame (the method calling System.exit()) - check if that remaining frame is on our whitelist This can only be commited to master (9.0), as it requires Java 9. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9109) Use Java 9+ StackWalker to implement TestSecurityManager's detection of JVM exit
[ https://issues.apache.org/jira/browse/LUCENE-9109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002443#comment-17002443 ] Uwe Schindler commented on LUCENE-9109: --- We should look at other places calling Thread.currentThread().getStackTrace or throws an exception just to get a stack trace. > Use Java 9+ StackWalker to implement TestSecurityManager's detection of JVM > exit > > > Key: LUCENE-9109 > URL: https://issues.apache.org/jira/browse/LUCENE-9109 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/test-framework >Reporter: Uwe Schindler >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0) > > Time Spent: 10m > Remaining Estimate: 0h > > This is just a small improvement in Lucene/Solr master (Java 11) to detect > exit of JVM in our test framework. There are other places in Lucene that use > ineffective ways to inspect the stack trace. > This one optimizes the implementation of > TestSecurityManager#checkExit(status) to disallow all JVM exits outside of > the official test runner by using StackWalker. In addition this needs no > additional permissions, because we do not instruct StackWalker to fetch all > crazy stuff like Class instances of stack elements. > The way how this works is: Walk through stack trace: > - skip all internal frames (those which come before the actual exit call) > - skip all frmes with the actual exit call > - limit to one more frame (the method calling System.exit()) > - check if that remaining frame is on our whitelist > This can only be commited to master (9.0), as it requires Java 9. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14143) Add request-logging to securing solr page
Robert Muir created SOLR-14143: -- Summary: Add request-logging to securing solr page Key: SOLR-14143 URL: https://issues.apache.org/jira/browse/SOLR-14143 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Robert Muir This functionality was cleaned up in SOLR-14138 and for a major release I've proposed to turn it on by default in SOLR-14142. But for now, I think the "securing solr" page should instruct how to turn this on. Hopefully if we fix the default in SOLR-14142, this paragraph can simply go away (I think it is expert to not want to log such a basic thing). There is some overlap with "audit logging", but for sure the request log is always more complete, since it logs things that never even make it to solr (as well as 4xx denied by solr itself, of course). You can see the differenes by running a simple nmap script scan of your solr instance or similar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14143) Add request-logging to securing solr page
[ https://issues.apache.org/jira/browse/SOLR-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002448#comment-17002448 ] Robert Muir commented on SOLR-14143: Simple patch: since it isn't default, I really want to keep it short and just make sure people are aware of it. > Add request-logging to securing solr page > - > > Key: SOLR-14143 > URL: https://issues.apache.org/jira/browse/SOLR-14143 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Attachments: SOLR-14143.patch > > > This functionality was cleaned up in SOLR-14138 and for a major release I've > proposed to turn it on by default in SOLR-14142. > But for now, I think the "securing solr" page should instruct how to turn > this on. Hopefully if we fix the default in SOLR-14142, this paragraph can > simply go away (I think it is expert to not want to log such a basic thing). > There is some overlap with "audit logging", but for sure the request log is > always more complete, since it logs things that never even make it to solr > (as well as 4xx denied by solr itself, of course). You can see the differenes > by running a simple nmap script scan of your solr instance or similar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14143) Add request-logging to securing solr page
[ https://issues.apache.org/jira/browse/SOLR-14143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-14143: --- Attachment: SOLR-14143.patch > Add request-logging to securing solr page > - > > Key: SOLR-14143 > URL: https://issues.apache.org/jira/browse/SOLR-14143 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Attachments: SOLR-14143.patch > > > This functionality was cleaned up in SOLR-14138 and for a major release I've > proposed to turn it on by default in SOLR-14142. > But for now, I think the "securing solr" page should instruct how to turn > this on. Hopefully if we fix the default in SOLR-14142, this paragraph can > simply go away (I think it is expert to not want to log such a basic thing). > There is some overlap with "audit logging", but for sure the request log is > always more complete, since it logs things that never even make it to solr > (as well as 4xx denied by solr itself, of course). You can see the differenes > by running a simple nmap script scan of your solr instance or similar. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit
uschindler commented on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit URL: https://github.com/apache/lucene-solr/pull/1114#issuecomment-568567830 You are right: > If there is a security manager, and this thread is not the current thread, then the security manager's checkPermission method is called with a RuntimePermission("getStackTrace") permission to see if it's ok to get the stack trace. As we were only looking at current thread it was ok. Not sure why we had the permission stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13817) Deprecate and remove legacy SolrCache implementations
[ https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002460#comment-17002460 ] David Smiley commented on SOLR-13817: - I'm looking at our {{_default}} configSet and I see class= all over the place for the caches. Shouldn't they have been removed? > Deprecate and remove legacy SolrCache implementations > - > > Key: SOLR-13817 > URL: https://issues.apache.org/jira/browse/SOLR-13817 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Andrzej Bialecki >Assignee: Andrzej Bialecki >Priority: Major > Fix For: master (9.0), 8.4 > > Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch > > > Now that SOLR-8241 has been committed I propose to deprecate other cache > implementations in 8x and remove them altogether from 9.0, in order to reduce > confusion and maintenance costs. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler edited a comment on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit
uschindler edited a comment on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit URL: https://github.com/apache/lucene-solr/pull/1114#issuecomment-568567830 You are right: > If there is a security manager, and this thread is not the current thread, then the security manager's checkPermission method is called with a RuntimePermission("getStackTrace") permission to see if it's ok to get the stack trace. As we were only looking at current thread it was useless to have the privileged context. Not sure why we had the permission stuff. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on issue #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on issue #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#issuecomment-568568657 Thanks Bruno and Mike - I've submitted some updates, could you take another look please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360989610 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); +spellchecker.add("classname", DirectSolrSpellChecker.class.getName()); +spellchecker.add(SolrSpellChecker.FIELD, "teststop"); +spellchecker.add(DirectSolrSpellChecker.MINQUERYLENGTH, 2); + +// demonstrate that "anothar" is not corrected when maxQueryLength is set to a small number +if (limitQueryLength) spellchecker.add(DirectSolrSpellChecker.MAXQUERYLENGTH, 4); + +SolrCore core = h.getCore(); +checker.init(spellchecker, core); + +h.getCore().withSearcher(searcher -> { + Collection tokens = queryConverter.convert("anothar"); + SpellingOptions spellOpts = new SpellingOptions(tokens, searcher.getIndexReader()); + SpellingResult result = checker.getSuggestions(spellOpts); + assertTrue("result should not be null", result != null); + Map suggestions = result.get(tokens.iterator().next()); + assertTrue("suggestions should not be null", suggestions != null); + + if (limitQueryLength) { +assertTrue("suggestions should be empty", suggestions.isEmpty()); + } else { +Map.Entry entry = suggestions.entrySet().iterator().next(); Review comment: done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360989706 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -79,7 +79,7 @@ public void test() throws Exception { return null; }); } - + Review comment: I think it's clearer what's going on there now! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option
andywebb1975 commented on a change in pull request #1113: SOLR-14131: adds maxQueryLength option URL: https://github.com/apache/lucene-solr/pull/1113#discussion_r360990276 ## File path: solr/core/src/test/org/apache/solr/spelling/DirectSolrSpellCheckerTest.java ## @@ -88,6 +88,45 @@ public void testOnlyMorePopularWithExtendedResults() throws Exception { "//lst[@name='spellcheck']/lst[@name='suggestions']/lst[@name='fox']/arr[@name='suggestion']/lst/int[@name='freq']=2", "//lst[@name='spellcheck']/bool[@name='correctlySpelled']='true'" ); - } + } + + @Test + public void testMaxQueryLength() throws Exception { +testMaxQueryLength(true); +testMaxQueryLength(false); + } + + private void testMaxQueryLength(Boolean limitQueryLength) throws Exception { + +DirectSolrSpellChecker checker = new DirectSolrSpellChecker(); +NamedList spellchecker = new NamedList(); Review comment: I've just used what you suggested here - am not too familiar with how this works. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 8:23 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Solr will apply the filter against the *entire index* and create a DocSet to cache. This will be slow compared to the postfilter if the number of search results is small relative to the size of the index. Which might be acceptable if the filter cache provided a big advantage on subsequent requests. But ... Solr's filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Solr will apply the filter against the entire index and create a DocSet to cache. Our filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-13890) Add postfilter support to {!terms} queries
[ https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002424#comment-17002424 ] Joel Bernstein edited comment on SOLR-13890 at 12/23/19 8:24 PM: - The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Solr will apply the filter against the *entire index* and create a DocSet to cache. This will be slow compared to the postfilter if the number of search results is small relative to the size of the index. Which might be acceptable if the filter cache provided a big advantage on subsequent requests. But ... Solr's filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. was (Author: joel.bernstein): The other really big aspect of this is caching. Even though the scorer based filter can be fast if it's applied with the main query, in Solr that's not going to happen. The reason is the filter cache. Solr will apply the filter against the *entire index* and create a DocSet to cache. This will be slow compared to the postfilter if the number of search results is small relative to the size of the index. Which might be acceptable if the filter cache provided a big advantage on subsequent requests. But ... Solr's filter cache is top level so it gets dumped after a single document is loaded. So in scenarios where there is lot's of indexing going on the filter cache becomes problematic. There are ways around this issue, like turning off caching using local params, or not using filter queries. But these approaches are not what users typically do with a filter. So, the postfilters behavior (not cached in filter cache) provides the best solution for certain situations where the filter cache is problematic. > Add postfilter support to {!terms} queries > -- > > Key: SOLR-13890 > URL: https://issues.apache.org/jira/browse/SOLR-13890 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Major > Attachments: SOLR-13890.patch, SOLR-13890.patch, SOLR-13890.patch > > > There are some use-cases where it'd be nice if the "terms" qparser created a > query that could be run as a postfilter. Particularly, when users are > checking for hundreds or thousands of terms, a postfilter implementation can > be more performant than the standard processing. > WIth this issue, I'd like to propose a post-filter implementation for the > {{docValuesTermsFilter}} "method". Postfilter creation can use a > SortedSetDocValues object to populate a DV bitset with the "terms" being > checked for. Each document run through the post-filter can look at their > doc-values for the field in question and check them efficiently against the > constructed bitset. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14141) eliminate JKS keystore from solr SSL docs
[ https://issues.apache.org/jira/browse/SOLR-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002488#comment-17002488 ] Robert Muir commented on SOLR-14141: The funniest part about this is that this step 1 is really creating a pkcs12 keystore. It is in fact not jks :) And the next step 2 that "converts" it is just converting pkcs12 <-> pkcs12. This craziness currently works because of how java's default security config is defined: {noformat} # # Default keystore type. # keystore.type=pkcs12 # # Controls compatibility mode for JKS and PKCS12 keystore types. # # When set to 'true', both JKS and PKCS12 keystore types support loading # keystore files in either JKS or PKCS12 format. When set to 'false' the # JKS keystore type supports loading only JKS keystore files and the PKCS12 # keystore type supports loading only PKCS12 keystore files. # keystore.type.compat=true {noformat} > eliminate JKS keystore from solr SSL docs > - > > Key: SOLR-14141 > URL: https://issues.apache.org/jira/browse/SOLR-14141 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > > On the "Enabling SSL" page: > https://lucene.apache.org/solr/guide/8_3/enabling-ssl.html#enabling-ssl > The first step is currently to create a JKS keystore. The next step > immediately converts the JKS keystore into PKCS12, so that openssl can then > be used to extract key material in PEM format for use with curl. > Now that PKCS12 is java's default keystore format, why not omit step 1 > entirely? What am I missing? PKCS12 is a more commonly > understood/standardized format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit
uschindler commented on issue #1114: LUCENE-9109: Use stack walker to implement TestSecurityManager's detection of test JVM exit URL: https://github.com/apache/lucene-solr/pull/1114#issuecomment-568582551 I just changed the static final predicate to a static method. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14141) eliminate JKS keystore from solr SSL docs
[ https://issues.apache.org/jira/browse/SOLR-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002492#comment-17002492 ] Uwe Schindler commented on SOLR-14141: -- FYI, it was always possible to run jetty with a p12 keystore. I ran Mortbay Jetty 10 years ago using a simple p12 file. > eliminate JKS keystore from solr SSL docs > - > > Key: SOLR-14141 > URL: https://issues.apache.org/jira/browse/SOLR-14141 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > > On the "Enabling SSL" page: > https://lucene.apache.org/solr/guide/8_3/enabling-ssl.html#enabling-ssl > The first step is currently to create a JKS keystore. The next step > immediately converts the JKS keystore into PKCS12, so that openssl can then > be used to extract key material in PEM format for use with curl. > Now that PKCS12 is java's default keystore format, why not omit step 1 > entirely? What am I missing? PKCS12 is a more commonly > understood/standardized format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14141) eliminate JKS keystore from solr SSL docs
[ https://issues.apache.org/jira/browse/SOLR-14141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002493#comment-17002493 ] Robert Muir commented on SOLR-14141: Yes possible, but the defaults/compat change described here looks like it happened in java 8: https://openjdk.java.net/jeps/166 So we can easily simplify. And if someone really does have an ancient jks keystore, it is no problem, even if we tell java wrongly that it is infact pkcs12. We are doing that already today in the opposite fashion (telling java the thing is JKS format, but in reality its pkcs12) > eliminate JKS keystore from solr SSL docs > - > > Key: SOLR-14141 > URL: https://issues.apache.org/jira/browse/SOLR-14141 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > > On the "Enabling SSL" page: > https://lucene.apache.org/solr/guide/8_3/enabling-ssl.html#enabling-ssl > The first step is currently to create a JKS keystore. The next step > immediately converts the JKS keystore into PKCS12, so that openssl can then > be used to extract key material in PEM format for use with curl. > Now that PKCS12 is java's default keystore format, why not omit step 1 > entirely? What am I missing? PKCS12 is a more commonly > understood/standardized format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] megancarey opened a new pull request #1115: SOLR-13101: Fix the gson version reference
megancarey opened a new pull request #1115: SOLR-13101: Fix the gson version reference URL: https://github.com/apache/lucene-solr/pull/1115 # Description Please provide a short description of the changes you're making with this pull request. # Solution Please provide a short description of the approach taken to implement your solution. # Tests Please describe the tests you've developed or run to confirm this patch implements the feature or solves the problem. # Checklist Please review the following and check all that apply: - [x] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [x] I have created a Jira issue and added the issue ID to my pull request title. - [x] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe opened a new pull request #1116: SOLR-14135: Utils.toJavabin returns a byte[] instead of InputStream
tflobbe opened a new pull request #1116: SOLR-14135: Utils.toJavabin returns a byte[] instead of InputStream URL: https://github.com/apache/lucene-solr/pull/1116 I'm not too convinced about this PR honestly, I started thinking in doing this mostly because in the 8x branch we can't use InputStream's `readAllBytes();` method, but this may actually hurt future consumers of this method, if they don't need to read all bytes at once. I'll leave this PR for now, worst case I'll keep the tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9093) Unified highlighter with word separator never gives context to the left
[ https://issues.apache.org/jira/browse/LUCENE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-9093: - Status: Patch Available (was: Open) > Unified highlighter with word separator never gives context to the left > --- > > Key: LUCENE-9093 > URL: https://issues.apache.org/jira/browse/LUCENE-9093 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Tim Retout >Priority: Major > Attachments: LUCENE-9093.patch > > > When using the unified highlighter with hl.bs.type=WORD, I am not able to get > context to the left of the matches returned; only words to the right of each > match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1. > Without context to the left of a match, the highlighted snippets are much > less useful for understanding where the match appears in a document. > As an example, using the techproducts data with Solr 7.1, given a search for > "apple", highlighting the "features" field: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified > I see this snippet: > "Apple Lossless, H.264 video" > Note that "Apple" is anchored to the left. Compare with the original > highlighter: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30 > And the match has context either side: > ", Audible, Apple Lossless, H.264 video" > (To complicate this, in general I am not sure that the unified highlighter is > respecting the hl.fragsize parameter, although [SOLR-9935] suggests support > was added. I included the hl.fragsize param in the unified URL too, but it's > making no difference unless set to 0.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9093) Unified highlighter with word separator never gives context to the left
[ https://issues.apache.org/jira/browse/LUCENE-9093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002506#comment-17002506 ] David Smiley commented on LUCENE-9093: -- Sorry for the delay. Can you please post a PR as it's more conducive to the code review process? I have a question about this setting. You've declared the benefits of it for a {{hl.bs.type=WORD}} but would this also be helpful for SENTENCE too? I hope so. I think in 9.0 the {{hl.fragalign}} setting should default to {{0.5}} or maybe {{0.25}} > Unified highlighter with word separator never gives context to the left > --- > > Key: LUCENE-9093 > URL: https://issues.apache.org/jira/browse/LUCENE-9093 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Tim Retout >Priority: Major > Attachments: LUCENE-9093.patch > > > When using the unified highlighter with hl.bs.type=WORD, I am not able to get > context to the left of the matches returned; only words to the right of each > match are shown. I see this behaviour on both Solr 6.4 and Solr 7.1. > Without context to the left of a match, the highlighted snippets are much > less useful for understanding where the match appears in a document. > As an example, using the techproducts data with Solr 7.1, given a search for > "apple", highlighting the "features" field: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.bs.type=WORD&hl.fragsize=30&hl.method=unified > I see this snippet: > "Apple Lossless, H.264 video" > Note that "Apple" is anchored to the left. Compare with the original > highlighter: > http://localhost:8983/solr/techproducts/select?hl.fl=features&hl=on&q=apple&hl.fragsize=30 > And the match has context either side: > ", Audible, Apple Lossless, H.264 video" > (To complicate this, in general I am not sure that the unified highlighter is > respecting the hl.fragsize parameter, although [SOLR-9935] suggests support > was added. I included the hl.fragsize param in the unified URL too, but it's > making no difference unless set to 0.) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud
[ https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002508#comment-17002508 ] Noble Paul commented on SOLR-13101: --- I would love to see a few more details Is it a standard Solr plugin that I can define in solrconfig.xml? Can it be configured through remote API? If yes? which one If not, let's have a separate discussion What are the public touch points? * remote APIs * configurations * files created/used in ZK/filesystem We need to make every new addition to Solr easily digestible to a casual observer. > Shared storage support in SolrCloud > --- > > Key: SOLR-13101 > URL: https://issues.apache.org/jira/browse/SOLR-13101 > Project: Solr > Issue Type: New Feature > Components: SolrCloud >Reporter: Yonik Seeley >Priority: Major > Time Spent: 8h > Remaining Estimate: 0h > > Solr should have first-class support for shared storage (blob/object stores > like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, > etc). > The key component will likely be a new replica type for shared storage. It > would have many of the benefits of the current "pull" replicas (not indexing > on all replicas, all shards identical with no shards getting out-of-sync, > etc), but would have additional benefits: > - Any shard could become leader (the blob store always has the index) > - Better elasticity scaling down >- durability not linked to number of replcias.. a single replica could be > common for write workloads >- could drop to 0 replicas for a shard when not needed (blob store always > has index) > - Allow for higher performance write workloads by skipping the transaction > log >- don't pay for what you don't need >- a commit will be necessary to flush to stable storage (blob store) > - A lot of the complexity and failure modes go away > An additional component a Directory implementation that will work well with > blob stores. We probably want one that treats local disk as a cache since > the latency to remote storage is so large. I think there are still some > "locking" issues to be solved here (ensuring that more than one writer to the > same index won't corrupt it). This should probably be pulled out into a > different JIRA issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9091) UnifiedHighlighter HTML escaping should only escape essentials
[ https://issues.apache.org/jira/browse/LUCENE-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002517#comment-17002517 ] ASF subversion and git services commented on LUCENE-9091: - Commit 1be5b689640fe4d1bf0ae3fd19c5fe93b20a77ef in lucene-solr's branch refs/heads/master from Nándor Mátravölgyi [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=1be5b68 ] LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials > UnifiedHighlighter HTML escaping should only escape essentials > -- > > Key: LUCENE-9091 > URL: https://issues.apache.org/jira/browse/LUCENE-9091 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Nándor Mátravölgyi >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9091.patch > > > The unified highlighter does not use the > *org.apache.lucene.search.highlight.SimpleHTMLEncoder* through > *org.apache.solr.highlight.HtmlEncoder*. It has the HTML escaping feature > re-implemented and embedded in the > *org.apache.lucene.search.uhighlight.DefaultPassageFormatter*. > The HTML escaping done by the unified highlighter escapes characters that do > not need it. This makes the result payload 50%+ more heavy with no benefit. > Here is a highlight snippet using the original highlighter: > {noformat} > A filter that stems words using a Snowball-generated stemmer. > Available stemmers & x are listed in org.tartarus.snowball.ext. Note: > This filter is aware of the KeywordAttribute. > {noformat} > Here is the same highlight snippet using the unified highlighter: > {noformat} > A filter that stems words using a Snowball-generated stemmer. Available stemmers & x are listed in org.tartarus.snowball.ext. Note: This filter is aware of the KeywordAttribute. > {noformat} > Maybe I'm missing the point why this is done the way it is. If this behaviour > is desired for some use-case it should be a separate encoder, and the HTML > encoder should only escape the necessary characters. > Affects all versions of Lucene-Solr since the addition of the > UnifiedHighlighter. Here are the lines where the escaping are implemented > differently: > * [Escaping by the unified > highlighter|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/DefaultPassageFormatter.java#L132] > * [Escaping by the other > highlighters|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/SimpleHTMLEncoder.java#L69] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9091) UnifiedHighlighter HTML escaping should only escape essentials
[ https://issues.apache.org/jira/browse/LUCENE-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002519#comment-17002519 ] ASF subversion and git services commented on LUCENE-9091: - Commit 80ad056babe577a63edf81f71d3fe525124ff43a in lucene-solr's branch refs/heads/branch_8x from Nándor Mátravölgyi [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=80ad056 ] LUCENE-9091: UnifiedHighlighter HTML escaping should only escape essentials (cherry picked from commit 1be5b689640fe4d1bf0ae3fd19c5fe93b20a77ef) > UnifiedHighlighter HTML escaping should only escape essentials > -- > > Key: LUCENE-9091 > URL: https://issues.apache.org/jira/browse/LUCENE-9091 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/highlighter >Reporter: Nándor Mátravölgyi >Assignee: David Smiley >Priority: Minor > Attachments: LUCENE-9091.patch > > > The unified highlighter does not use the > *org.apache.lucene.search.highlight.SimpleHTMLEncoder* through > *org.apache.solr.highlight.HtmlEncoder*. It has the HTML escaping feature > re-implemented and embedded in the > *org.apache.lucene.search.uhighlight.DefaultPassageFormatter*. > The HTML escaping done by the unified highlighter escapes characters that do > not need it. This makes the result payload 50%+ more heavy with no benefit. > Here is a highlight snippet using the original highlighter: > {noformat} > A filter that stems words using a Snowball-generated stemmer. > Available stemmers & x are listed in org.tartarus.snowball.ext. Note: > This filter is aware of the KeywordAttribute. > {noformat} > Here is the same highlight snippet using the unified highlighter: > {noformat} > A filter that stems words using a Snowball-generated stemmer. Available stemmers & x are listed in org.tartarus.snowball.ext. Note: This filter is aware of the KeywordAttribute. > {noformat} > Maybe I'm missing the point why this is done the way it is. If this behaviour > is desired for some use-case it should be a separate encoder, and the HTML > encoder should only escape the necessary characters. > Affects all versions of Lucene-Solr since the addition of the > UnifiedHighlighter. Here are the lines where the escaping are implemented > differently: > * [Escaping by the unified > highlighter|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/uhighlight/DefaultPassageFormatter.java#L132] > * [Escaping by the other > highlighters|https://github.com/apache/lucene-solr/blob/2387bb9d60ae44eeeb4fbcb2f2877f46be5303a0/lucene/highlighter/src/java/org/apache/lucene/search/highlight/SimpleHTMLEncoder.java#L69] > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14095) Remove serialization and/or support serialization filtering
[ https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002528#comment-17002528 ] ASF subversion and git services commented on SOLR-14095: Commit 5f5ef58117578045de3798dd487b89246c15a23b in lucene-solr's branch refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5f5ef58 ] SOLR-14095: Fix Java 8 compile issue > Remove serialization and/or support serialization filtering > --- > > Key: SOLR-14095 > URL: https://issues.apache.org/jira/browse/SOLR-14095 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Attachments: SOLR-14095-json.patch, json-nl.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Removing the use of serialization is greatly preferred. > But if serialization over the wire must really happen, then we must use JDK's > serialization filtering capability to prevent havoc. > https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14095) Remove serialization and/or support serialization filtering
[ https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17002527#comment-17002527 ] ASF subversion and git services commented on SOLR-14095: Commit fe04a5b6f0a5ea3c8d1d2675d12740d299d1c4b0 in lucene-solr's branch refs/heads/branch_8x from Tomas Eduardo Fernandez Lobbe [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fe04a5b ] SOLR-14095: Let the overseer use javabin to store responses in ZooKeeper (#1095) The Overseer used java serialization to store command responses in ZooKeeper. This commit changes the code to use Javabin instead, while allowing Java serialization with a System property in case it's needed for compatibility > Remove serialization and/or support serialization filtering > --- > > Key: SOLR-14095 > URL: https://issues.apache.org/jira/browse/SOLR-14095 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Robert Muir >Priority: Major > Attachments: SOLR-14095-json.patch, json-nl.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Removing the use of serialization is greatly preferred. > But if serialization over the wire must really happen, then we must use JDK's > serialization filtering capability to prevent havoc. > https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361021908 ## File path: solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java ## @@ -954,4 +987,46 @@ public static void persistConfLocally(SolrResourceLoader loader, String resource } } + // TODO document these methods... Review comment: What is the motivation behind `SolrResourceLoader` returning packages? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361022358 ## File path: solr/core/src/java/org/apache/solr/schema/IndexSchemaFactory.java ## @@ -62,7 +62,7 @@ public static IndexSchema buildIndexSchema(String resourceName, SolrConfig confi PluginInfo info = config.getPluginInfo(IndexSchemaFactory.class.getName()); IndexSchemaFactory factory; if (null != info) { - factory = config.getResourceLoader().newInstance(info.className, IndexSchemaFactory.class); + factory = config.getResourceLoader().newInstance(info, IndexSchemaFactory.class); Review comment: Do we even support the `packageName:ClassName` scheme in `schema.xml` at all? How does it play with this? Need more discussions This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361020495 ## File path: solr/core/src/java/org/apache/solr/core/SolrCore.java ## @@ -519,7 +518,7 @@ private IndexDeletionPolicyWrapper initDeletionPolicy(IndexDeletionPolicyWrapper final PluginInfo info = solrConfig.getPluginInfo(IndexDeletionPolicy.class.getName()); final IndexDeletionPolicy delPolicy; if (info != null) { - delPolicy = createInstance(info.className, IndexDeletionPolicy.class, "Deletion Policy for SOLR", this, getResourceLoader()); + delPolicy = newInstance(info, IndexDeletionPolicy.class, this, getResourceLoader()); Review comment: Looks wrong. Shouldn't we get the correct Package classloader here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361018813 ## File path: solr/core/src/java/org/apache/solr/core/DirectoryFactory.java ## @@ -420,7 +420,7 @@ static DirectoryFactory loadDirectoryFactory(SolrConfig config, CoreContainer cc final DirectoryFactory dirFactory; if (info != null) { log.debug(info.className); - dirFactory = config.getResourceLoader().newInstance(info.className, DirectoryFactory.class); + dirFactory = config.getResourceLoader().newInstance(info, DirectoryFactory.class); Review comment: Whats the point of this call? We should never use `config.getResourceLoader()` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361021700 ## File path: solr/core/src/java/org/apache/solr/core/SolrResourceLoader.java ## @@ -529,6 +536,18 @@ public String resourceLocation(String resource) { Class clazz = null; try { + // If there is a package name prefix ... + Pair pkgClassPair = PluginInfo.parseClassName(cname); + PackageLoader.Package pkg = getPackage(pkgClassPair.first()); + if (pkg == null) { +// essentially, remove the package prefix and continue as normal. Maybe it'll be found. +cname = pkgClassPair.second(); + } else { +// TODO what version? Review comment: Trying to load the latest always? Why? Need to revisit. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361019528 ## File path: solr/core/src/java/org/apache/solr/core/PluginBag.java ## @@ -140,11 +139,10 @@ public static void initInstance(Object inst, PluginInfo info) { log.debug("{} : '{}' created with startup=lazy ", meta.getCleanTag(), info.name); return new LazyPluginHolder(meta, info, core, core.getResourceLoader(), false); } else { - if (info.pkgName != null) { -PackagePluginHolder holder = new PackagePluginHolder<>(info, core, meta); -return holder; + if (core.getResourceLoader().getPackage(info.pkgName) != null) { Review comment: Shouldn't we fail fast instead of continuing as if it there is no problem? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361021135 ## File path: solr/core/src/java/org/apache/solr/core/SolrCore.java ## @@ -810,28 +809,30 @@ void initIndex(boolean passOnPreviousState, boolean reload) throws IOException { * Creates an instance by trying a constructor that accepts a SolrCore before * trying the default (no arg) constructor. * - * @param className the instance class to create + * @param pluginInfo the instance class to create * @param cast the class or interface that the instance should extend or implement - * @param msg a message helping compose the exception error if any occurs. * @param core The SolrCore instance for which this object needs to be loaded * @return the desired instance * @throws SolrException if the object could not be instantiated */ - public static T createInstance(String className, Class cast, String msg, SolrCore core, ResourceLoader resourceLoader) { -Class clazz = null; -if (msg == null) msg = "SolrCore Object"; + public static T newInstance(PluginInfo pluginInfo, Class cast, SolrCore core, SolrResourceLoader resourceLoader) { +String msg = pluginInfo.type; try { - clazz = resourceLoader.findClass(className, cast); - //most of the classes do not have constructors which takes SolrCore argument. It is recommended to obtain SolrCore by implementing SolrCoreAware. - // So invariably always it will cause a NoSuchMethodException. So iterate though the list of available constructors - Constructor[] cons = clazz.getConstructors(); - for (Constructor con : cons) { -Class[] types = con.getParameterTypes(); -if (types.length == 1 && types[0] == SolrCore.class) { - return cast.cast(con.newInstance(core)); + //TODO separate out "core" scenario to another method + if (pluginInfo.pkgName == null && core != null) { +Class clazz = resourceLoader.findClass(pluginInfo.className, cast); +//most of the classes do not have constructors which takes SolrCore argument. It is recommended to obtain SolrCore by implementing SolrCoreAware. Review comment: This has nothing to do with package loader. Can be a separate ticket This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361021282 ## File path: solr/core/src/java/org/apache/solr/core/SolrCore.java ## @@ -890,7 +897,7 @@ private UpdateHandler createReloadedUpdateHandler(String className, String msg, } private UpdateHandler createUpdateHandler(String className) { -return createInstance(className, UpdateHandler.class, "Update Handler", this, getResourceLoader()); +return newInstance(new PluginInfo("updateHandler", className), UpdateHandler.class, this, getResourceLoader()); Review comment: Again. No thought given to how it updates, if package is updated This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361017640 ## File path: solr/contrib/velocity/src/java/org/apache/solr/response/VelocityResponseWriter.java ## @@ -275,7 +276,7 @@ private VelocityContext createContext(SolrQueryRequest request, SolrQueryRespons for (Map.Entry entry : customTools.entrySet()) { String name = entry.getKey(); // TODO: at least log a warning when one of the *fixed* tools classes is same name with a custom one, currently silently ignored - Object customTool = SolrCore.createInstance(entry.getValue(), Object.class, "VrW custom tool: " + name, request.getCore(), request.getCore().getResourceLoader()); + Object customTool = SolrCore.newInstance(new PluginInfo(name, entry.getValue()), Object.class, request.getCore(), request.getCore().getResourceLoader()); Review comment: What is the purpose of this? Apparently this is not even using the right Classloader This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo
noblepaul commented on a change in pull request #1109: More pervasive use of PackageLoader / PluginInfo URL: https://github.com/apache/lucene-solr/pull/1109#discussion_r361021009 ## File path: solr/core/src/java/org/apache/solr/core/SolrCore.java ## @@ -810,28 +809,30 @@ void initIndex(boolean passOnPreviousState, boolean reload) throws IOException { * Creates an instance by trying a constructor that accepts a SolrCore before * trying the default (no arg) constructor. * - * @param className the instance class to create + * @param pluginInfo the instance class to create * @param cast the class or interface that the instance should extend or implement - * @param msg a message helping compose the exception error if any occurs. * @param core The SolrCore instance for which this object needs to be loaded * @return the desired instance * @throws SolrException if the object could not be instantiated */ - public static T createInstance(String className, Class cast, String msg, SolrCore core, ResourceLoader resourceLoader) { -Class clazz = null; -if (msg == null) msg = "SolrCore Object"; + public static T newInstance(PluginInfo pluginInfo, Class cast, SolrCore core, SolrResourceLoader resourceLoader) { +String msg = pluginInfo.type; try { - clazz = resourceLoader.findClass(className, cast); - //most of the classes do not have constructors which takes SolrCore argument. It is recommended to obtain SolrCore by implementing SolrCoreAware. - // So invariably always it will cause a NoSuchMethodException. So iterate though the list of available constructors - Constructor[] cons = clazz.getConstructors(); - for (Constructor con : cons) { -Class[] types = con.getParameterTypes(); -if (types.length == 1 && types[0] == SolrCore.class) { - return cast.cast(con.newInstance(core)); + //TODO separate out "core" scenario to another method + if (pluginInfo.pkgName == null && core != null) { +Class clazz = resourceLoader.findClass(pluginInfo.className, cast); Review comment: The repeated pattern I see is, The patch only looks at how to load the class when the core is loaded. Pays no attention to how to reload it, if the package is updated. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org