[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core
[ https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023733#comment-17023733 ] Uwe Schindler commented on SOLR-12045: -- You commit seems to break precommit (interestingly only on Windows): {noformat} validate-source-patterns: [source-patterns] Unescaped symbol "->" on line #43: solr/solr-ref-guide/src/analytics.adoc [source-patterns] Unescaped symbol "->" on line #52: solr/solr-ref-guide/src/analytics.adoc {noformat} > Move Analytics Component from contrib to core > - > > Key: SOLR-12045 > URL: https://issues.apache.org/jira/browse/SOLR-12045 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.0 >Reporter: Houston Putman >Priority: Major > Fix For: 8.1, master (9.0) > > Attachments: SOLR-12045.rb-visibility.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Analytics Component currently lives in contrib. Since it includes no > external dependencies, there is no harm in moving it into core solr. > The analytics component would be included as a default search component and > the analytics handler (currently only used for analytics shard requests, > might be transitioned to handle user requests in the future) would be > included as an implicit handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss merged pull request #1121: SOLR-11207: Add OWASP dependency checker to gradle build
dweiss merged pull request #1121: SOLR-11207: Add OWASP dependency checker to gradle build URL: https://github.com/apache/lucene-solr/pull/1121 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023735#comment-17023735 ] ASF subversion and git services commented on SOLR-11207: Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ] SOLR-11207: Add OWASP dependency checker to gradle build (#1121) * SOLR-11207: Add OWASP dependency checker to gradle build > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023736#comment-17023736 ] ASF subversion and git services commented on SOLR-11207: Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch refs/heads/master from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ] SOLR-11207: Add OWASP dependency checker to gradle build (#1121) * SOLR-11207: Add OWASP dependency checker to gradle build > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core
[ https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023743#comment-17023743 ] Mikhail Khludnev commented on SOLR-12045: - I'm sorry. I'll fix it in a few hours. > Move Analytics Component from contrib to core > - > > Key: SOLR-12045 > URL: https://issues.apache.org/jira/browse/SOLR-12045 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.0 >Reporter: Houston Putman >Priority: Major > Fix For: 8.1, master (9.0) > > Attachments: SOLR-12045.rb-visibility.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Analytics Component currently lives in contrib. Since it includes no > external dependencies, there is no harm in moving it into core solr. > The analytics component would be included as a default search component and > the analytics handler (currently only used for analytics shard requests, > might be transitioned to handle user requests in the future) would be > included as an implicit handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9173) SynonymGraphFilter doesn't correctly consume decompounded tokens (branched token graph)
Tomoko Uchida created LUCENE-9173: - Summary: SynonymGraphFilter doesn't correctly consume decompounded tokens (branched token graph) Key: LUCENE-9173 URL: https://issues.apache.org/jira/browse/LUCENE-9173 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Tomoko Uchida This is a derived issue from LUCENE-9123. When the tokenizer that is given to SynonymGraphFilter decompound tokens or emit multiple tokens at the same position, SynonymGraphFilter cannot correctly handle them (an exception will be thrown). For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two decompounded tokens for the text "株式会社": {code:java} 株式会社 (positionIncrement=0, positionLength=2) 株式 (positionIncrement=1, positionLength=1) 会社 (positionIncrement=1, positionLength=1) {code} Then if we give synonym "株式会社,コーポレーション" by SynonymGraphFilter (set tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown. {code:java} Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 (got: 0) at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] {code} This isn't only limited to JapaneseTokenizer but a more general issue about handling branched token graph (decompounded tokens in the midstream). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023754#comment-17023754 ] ASF subversion and git services commented on SOLR-11207: Commit 5ab59f59ac48c00c7f2047a92a5c7c0451490cf1 in lucene-solr's branch refs/heads/master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ab59f5 ] SOLR-11207: minor changes: - added 'owasp' task to the root project. This depends on dependencyCheckAggregate which seems to be a better fit for multi-module projects than dependencyCheckAnalyze (the difference is vague to me from plugin's documentation). - you can run the "gradlew owasp" task explicitly and it'll run the validation without any flags. - the owasp task is only added to check if validation.owasp property is true. I think this should stay as the default on non-CI systems (developer defaults) because it's a significant chunk of time it takes to download and validate dependencies. - I'm not sure *all* configurations should be included in the check... perhaps we should only limit ourselves to actual runtime dependencies not build dependencies, solr-ref-guide, etc. > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023757#comment-17023757 ] Tomoko Uchida commented on LUCENE-9123: --- I opened an issue for the SynonymGraphFilter: LUCENE-9173. Also I found an issue about multi-word synonyms LUCENE-8137, it seems like it's a different issue discussed here (but I'm not fully sure of that). > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core
[ https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023764#comment-17023764 ] Uwe Schindler commented on SOLR-12045: -- It was not your problem. The reason for the issue was a config change regarding line endings and then a bug in the source-patterns checker. It did not split lines correctly because the line splitter regex was broken ({{\n\r}} instead of the correct {{\r\n}}). Because you touched the file it was checked out again on Jenkins, suddenly having new line endings. > Move Analytics Component from contrib to core > - > > Key: SOLR-12045 > URL: https://issues.apache.org/jira/browse/SOLR-12045 > Project: Solr > Issue Type: Improvement >Affects Versions: 8.0 >Reporter: Houston Putman >Priority: Major > Fix For: 8.1, master (9.0) > > Attachments: SOLR-12045.rb-visibility.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > The Analytics Component currently lives in contrib. Since it includes no > external dependencies, there is no harm in moving it into core solr. > The analytics component would be included as a default search component and > the analytics handler (currently only used for analytics shard requests, > might be transitioned to handle user requests in the future) would be > included as an implicit handler. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023765#comment-17023765 ] Uwe Schindler commented on SOLR-14189: -- I will merge this now. > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Assigned] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned SOLR-14189: Assignee: Uwe Schindler > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Time Spent: 1h 50m > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-14189: - Fix Version/s: 8.5 master (9.0) > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023766#comment-17023766 ] ASF subversion and git services commented on SOLR-14189: Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch refs/heads/master from andywebb1975 [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ] SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172) > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler merged pull request #1172: SOLR-14189 switch from String.trim() to StringUtils.isBlank()
uschindler merged pull request #1172: SOLR-14189 switch from String.trim() to StringUtils.isBlank() URL: https://github.com/apache/lucene-solr/pull/1172 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023767#comment-17023767 ] ASF subversion and git services commented on SOLR-14189: Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ] SOLR-14189: Add changes entry > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023769#comment-17023769 ] ASF subversion and git services commented on SOLR-14189: Commit e934c8a7caee42565bd4c3982e6b46a561ebecfe in lucene-solr's branch refs/heads/branch_8x from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e934c8a ] SOLR-14189: Add changes entry > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023768#comment-17023768 ] ASF subversion and git services commented on SOLR-14189: Commit 43085edaa6954f212d1a7f19a2f60e3d0de73ae6 in lucene-solr's branch refs/heads/branch_8x from andywebb1975 [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=43085ed ] SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172) > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-14189: - Resolution: Fixed Status: Resolved (was: Patch Available) > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023770#comment-17023770 ] Uwe Schindler commented on SOLR-14189: -- Thanks Andy! > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778 ] Tomoko Uchida commented on LUCENE-9123: --- When reproducing this issue I noticed that JapaneseTokenizer (mode=search) gives positionIncrements=1 for the decompounded token "株式" instead of 0. This looks strange to me, is this an expected behaviour? If not, this may affect the synonyms handling? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit
[ https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023785#comment-17023785 ] Jörn Franke commented on SOLR-14202: ok, thanks for the feedabck and details. I will check the attached program. > Old segments are not deleted after commit > - > > Key: SOLR-14202 > URL: https://issues.apache.org/jira/browse/SOLR-14202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.4 >Reporter: Jörn Franke >Priority: Major > Attachments: eoe.zip > > > The data directory of a collection is growing and growing. It seems that old > segments are not deleted. They are only deleting during start of Solr. > How to reproduce. Have any collection (e.g. the example collection) and start > indexing documents. Even during the indexing the data directory is growing > significantly - much more than expected (several magnitudes). if certain > documents are updated (without significantly increasing the amount of data) > the index data directory grows again several magnitudes. Even for small > collections the needed space explodes. > This reduces significantly if Solr is stopped and then started. During > startup (not shutdown) Solr purges all those segments if not needed (* > sometimes some but not a significant amount is deleted during shutdown). This > is of course not a good workaround for normal operations. > It does not seem to have a affect on queries (their performance do not seem > to change). > The configs have not changed before the upgrade and after (e.g. from Solr 8.2 > to 8.3 to 8.4, not cross major versions), so I assume it could be related to > Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2. > > IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, > openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000). > Nevertheless, it did not happen in previous versions of Solr and the config > did not change. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-14202) Old segments are not deleted after commit
[ https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023785#comment-17023785 ] Jörn Franke edited comment on SOLR-14202 at 1/26/20 12:12 PM: -- ok, thanks for the feedback and details. I will check the attached program. I was suspecting teh FreeTextLookupFactory for the suggester, but I could in the end not verify it. The strange thing is that the permissions/configurations etc. have not changed. was (Author: jornfranke): ok, thanks for the feedabck and details. I will check the attached program. > Old segments are not deleted after commit > - > > Key: SOLR-14202 > URL: https://issues.apache.org/jira/browse/SOLR-14202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.4 >Reporter: Jörn Franke >Priority: Major > Attachments: eoe.zip > > > The data directory of a collection is growing and growing. It seems that old > segments are not deleted. They are only deleting during start of Solr. > How to reproduce. Have any collection (e.g. the example collection) and start > indexing documents. Even during the indexing the data directory is growing > significantly - much more than expected (several magnitudes). if certain > documents are updated (without significantly increasing the amount of data) > the index data directory grows again several magnitudes. Even for small > collections the needed space explodes. > This reduces significantly if Solr is stopped and then started. During > startup (not shutdown) Solr purges all those segments if not needed (* > sometimes some but not a significant amount is deleted during shutdown). This > is of course not a good workaround for normal operations. > It does not seem to have a affect on queries (their performance do not seem > to change). > The configs have not changed before the upgrade and after (e.g. from Solr 8.2 > to 8.3 to 8.4, not cross major versions), so I assume it could be related to > Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2. > > IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, > openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000). > Nevertheless, it did not happen in previous versions of Solr and the config > did not change. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit
[ https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023790#comment-17023790 ] Jörn Franke commented on SOLR-14202: The files are btw. deleted when Solr starts - not on shutdown > Old segments are not deleted after commit > - > > Key: SOLR-14202 > URL: https://issues.apache.org/jira/browse/SOLR-14202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.4 >Reporter: Jörn Franke >Priority: Major > Attachments: eoe.zip > > > The data directory of a collection is growing and growing. It seems that old > segments are not deleted. They are only deleting during start of Solr. > How to reproduce. Have any collection (e.g. the example collection) and start > indexing documents. Even during the indexing the data directory is growing > significantly - much more than expected (several magnitudes). if certain > documents are updated (without significantly increasing the amount of data) > the index data directory grows again several magnitudes. Even for small > collections the needed space explodes. > This reduces significantly if Solr is stopped and then started. During > startup (not shutdown) Solr purges all those segments if not needed (* > sometimes some but not a significant amount is deleted during shutdown). This > is of course not a good workaround for normal operations. > It does not seem to have a affect on queries (their performance do not seem > to change). > The configs have not changed before the upgrade and after (e.g. from Solr 8.2 > to 8.3 to 8.4, not cross major versions), so I assume it could be related to > Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2. > > IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, > openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000). > Nevertheless, it did not happen in previous versions of Solr and the config > did not change. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023821#comment-17023821 ] Andy Webb commented on SOLR-14189: -- Happy to help - thanks Uwe (and Christine)! > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed
Andy Webb created SOLR-14219: Summary: OverseerSolrResponse's serialVersionUID has changed Key: SOLR-14219 URL: https://issues.apache.org/jira/browse/SOLR-14219 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Reporter: Andy Webb When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is used, the serialized OverseerSolrResponse has a different serialVersionUID to earlier versions, making it backwards-incompatible. (PR incoming) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023863#comment-17023863 ] Tomoko Uchida commented on LUCENE-9123: --- Thanks [~h.kazuaki] for updating the patches. +1, I will commit them with CHANGES and MIGRATE entries next weekend or so (sorry for the delay, I may not have time to test them locally right now). Meanwhile can you tell us the e-mail address that will be logged as the author of the patch? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1210: SOLR-14219 force serialVersionUID of OverseerSolrResponse
andywebb1975 opened a new pull request #1210: SOLR-14219 force serialVersionUID of OverseerSolrResponse URL: https://github.com/apache/lucene-solr/pull/1210 # Description When the useUnsafeOverseerResponse=true option introduced in SOLR-14095 is used, the serialized OverseerSolrResponse has a different serialVersionUID to earlier versions, making it backwards-incompatible. # Solution This PR forces the serialVersionUID of OverseerSolrResponse to be the same value as before. # Tests Tested in a prototype environment with a mixed 8.4.1 and master-branch Solr nodes. # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `ant precommit` and the appropriate test suite. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed
[ https://issues.apache.org/jira/browse/SOLR-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Webb updated SOLR-14219: - Description: When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is used, the serialized OverseerSolrResponse has a different serialVersionUID to earlier versions, making it backwards-incompatible. https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID to its old value, so old and new nodes become compatible. was: When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is used, the serialized OverseerSolrResponse has a different serialVersionUID to earlier versions, making it backwards-incompatible. (PR incoming) > OverseerSolrResponse's serialVersionUID has changed > --- > > Key: SOLR-14219 > URL: https://issues.apache.org/jira/browse/SOLR-14219 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Andy Webb >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 > is used, the serialized OverseerSolrResponse has a different serialVersionUID > to earlier versions, making it backwards-incompatible. > https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID > to its old value, so old and new nodes become compatible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14095) Replace Java serialization with Javabin in Overseer operations
[ https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023866#comment-17023866 ] Andy Webb commented on SOLR-14095: -- hi, I've been experimenting with upgrading to 8.5.0+ in a prototyping environment using {{useUnsafeOverseerResponse=true}} and have found that a mixed pool of older/newer nodes gives the exception {{java.io.InvalidClassException: org.apache.solr.cloud.OverseerSolrResponse; local class incompatible: stream classdesc serialVersionUID = 4721653044098960880, local class serialVersionUID = -3791204262816422245}} (or vice-versa, depending on which node is the overseer). I've attached a PR to SOLR-14219 which I've found resolved this issue - please would someone review this? thanks, Andy > Replace Java serialization with Javabin in Overseer operations > -- > > Key: SOLR-14095 > URL: https://issues.apache.org/jira/browse/SOLR-14095 > Project: Solr > Issue Type: Task >Reporter: Robert Muir >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-14095-json.patch, json-nl.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Removing the use of serialization is greatly preferred. > But if serialization over the wire must really happen, then we must use JDK's > serialization filtering capability to prevent havoc. > https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed
[ https://issues.apache.org/jira/browse/SOLR-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Webb updated SOLR-14219: - Status: Patch Available (was: Open) > OverseerSolrResponse's serialVersionUID has changed > --- > > Key: SOLR-14219 > URL: https://issues.apache.org/jira/browse/SOLR-14219 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Reporter: Andy Webb >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 > is used, the serialized OverseerSolrResponse has a different serialVersionUID > to earlier versions, making it backwards-incompatible. > https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID > to its old value, so old and new nodes become compatible. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9173) SynonymGraphFilter doesn't correctly consume decompounded tokens (branched token graph)
[ https://issues.apache.org/jira/browse/LUCENE-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tomoko Uchida updated LUCENE-9173: -- Description: This is a derived issue from LUCENE-9123. When the tokenizer that is given to SynonymGraphFilter decompound tokens or emit multiple tokens at the same position, SynonymGraphFilter cannot correctly handle them (an exception will be thrown). For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two decompounded tokens for the text "株式会社": {code:java} 株式会社 (positionIncrement=0, positionLength=2) 株式 (positionIncrement=1, positionLength=1) 会社 (positionIncrement=1, positionLength=1) {code} Then if we give a synonym "株式会社,コーポレーション" by SynonymGraphFilterFactory (set tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown. {code:java} Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 (got: 0) at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] {code} This isn't only limited to JapaneseTokenizer but a more general issue about handling branched token graph (decompounded tokens in the midstream). was: This is a derived issue from LUCENE-9123. When the tokenizer that is given to SynonymGraphFilter decompound tokens or emit multiple tokens at the same position, SynonymGraphFilter cannot correctly handle them (an exception will be thrown). For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two decompounded tokens for the text "株式会社": {code:java} 株式会社 (positionIncrement=0, positionLength=2) 株式 (positionIncrement=1, positionLength=1) 会社 (positionIncrement=1, positionLength=1) {code} Then if we give synonym "株式会社,コーポレーション" by SynonymGraphFilter (set tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown. {code:java} Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 (got: 0) at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] at org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154) ~[lucene-analyzers-common-8.4.0.jar:8.4.0 bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38] {code} This isn't only limited to JapaneseTokenizer but a more general issue about handling branched token graph (decompounded tokens in the midstream). > SynonymGraphFilter doesn't correctly consume decompounded tokens (branched > token graph) > > > Key: LUCENE-9173 > URL: https://issues.apache.org/jira/browse/LUCENE-9173 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Reporter: Tomoko Uchida >Priority: Minor > > This is a derived issue from LUCENE-9123. > When the tokenizer that is given to SynonymGraphFilter decompound tokens or > emit multiple tokens at the same position, Synonym
[jira] [Comment Edited] (LUCENE-4702) Terms dictionary compression
[ https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023871#comment-17023871 ] Adrien Grand edited comment on LUCENE-4702 at 1/26/20 5:09 PM: --- Nightly benchmarks are seeing worse slowdowns than what I was observing locally: - slower indexing http://people.apache.org/~mikemccand/lucenebench/indexing.html - slower fuzzy queries http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html - slower wildcard queries http://people.apache.org/~mikemccand/lucenebench/Wildcard.html - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html Interestingly PK lookups got faster ( ! ) http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix queries are only barely slower http://people.apache.org/~mikemccand/lucenebench/Prefix3.html. I'll look into it. was (Author: jpountz): Nightly benchmarks are seeing worse slowdowns than what I was observing locally: - slower indexing http://people.apache.org/~mikemccand/lucenebench/indexing.html - slower fuzzy queries http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html - slower wildcard queries http://people.apache.org/~mikemccand/lucenebench/Wildcard.html - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html Interestingly PK lookups got faster (!) http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix queries are only barely slower http://people.apache.org/~mikemccand/lucenebench/Prefix3.html. I'll look into it. > Terms dictionary compression > > > Key: LUCENE-4702 > URL: https://issues.apache.org/jira/browse/LUCENE-4702 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Trivial > Attachments: LUCENE-4702.patch, LUCENE-4702.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I've done a quick test with the block tree terms dictionary by replacing a > call to IndexOutput.writeBytes to write suffix bytes with a call to > LZ4.compressHC to test the peformance hit. Interestingly, search performance > was very good (see comparison table below) and the tim files were 14% smaller > (from 150432 bytes overall to 129516). > {noformat} > TaskQPS baseline StdDevQPS compressed StdDev > Pct diff > Fuzzy1 111.50 (2.0%) 78.78 (1.5%) > -29.4% ( -32% - -26%) > Fuzzy2 36.99 (2.7%) 28.59 (1.5%) > -22.7% ( -26% - -18%) > Respell 122.86 (2.1%) 103.89 (1.7%) > -15.4% ( -18% - -11%) > Wildcard 100.58 (4.3%) 94.42 (3.2%) > -6.1% ( -13% -1%) > Prefix3 124.90 (5.7%) 122.67 (4.7%) > -1.8% ( -11% -9%) >OrHighLow 169.87 (6.8%) 167.77 (8.0%) > -1.2% ( -15% - 14%) > LowTerm 1949.85 (4.5%) 1929.02 (3.4%) > -1.1% ( -8% -7%) > AndHighLow 2011.95 (3.5%) 1991.85 (3.3%) > -1.0% ( -7% -5%) > OrHighHigh 155.63 (6.7%) 154.12 (7.9%) > -1.0% ( -14% - 14%) > AndHighHigh 341.82 (1.2%) 339.49 (1.7%) > -0.7% ( -3% -2%) >OrHighMed 217.55 (6.3%) 216.16 (7.1%) > -0.6% ( -13% - 13%) > IntNRQ 53.10 (10.9%) 52.90 (8.6%) > -0.4% ( -17% - 21%) > MedTerm 998.11 (3.8%) 994.82 (5.6%) > -0.3% ( -9% -9%) > MedSpanNear 60.50 (3.7%) 60.36 (4.8%) > -0.2% ( -8% -8%) > HighSpanNear 19.74 (4.5%) 19.72 (5.1%) > -0.1% ( -9% -9%) > LowSpanNear 101.93 (3.2%) 101.82 (4.4%) > -0.1% ( -7% -7%) > AndHighMed 366.18 (1.7%) 366.93 (1.7%) > 0.2% ( -3% -3%) > PKLookup 237.28 (4.0%) 237.96 (4.2%) > 0.3% ( -7% -8%) >MedPhrase 173.17 (4.7%) 174.69 (4.7%) > 0.9% ( -8% - 10%) > LowSloppyPhrase 180.91 (2.6%) 182.79 (2.7%) > 1.0% ( -4% -6%) >LowPhrase 374.64 (5.5%) 379.11 (5.8%) > 1.2% ( -9% - 13%) > HighTerm 253.14 (7.9%) 256.97 (11.4%) > 1.5% ( -16% - 22%) > HighPhrase 19.52 (10.6%) 19.83 (11.0%) > 1.6% ( -18% - 25%) > MedSloppyPhrase 141.90 (2.6%) 144.11 (2.5%) > 1.6% (
[jira] [Commented] (LUCENE-4702) Terms dictionary compression
[ https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023871#comment-17023871 ] Adrien Grand commented on LUCENE-4702: -- Nightly benchmarks are seeing worse slowdowns than what I was observing locally: - slower indexing http://people.apache.org/~mikemccand/lucenebench/indexing.html - slower fuzzy queries http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html - slower wildcard queries http://people.apache.org/~mikemccand/lucenebench/Wildcard.html - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html Interestingly PK lookups got faster (!) http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix queries are only barely slower http://people.apache.org/~mikemccand/lucenebench/Prefix3.html. I'll look into it. > Terms dictionary compression > > > Key: LUCENE-4702 > URL: https://issues.apache.org/jira/browse/LUCENE-4702 > Project: Lucene - Core > Issue Type: Wish >Reporter: Adrien Grand >Assignee: Adrien Grand >Priority: Trivial > Attachments: LUCENE-4702.patch, LUCENE-4702.patch > > Time Spent: 3.5h > Remaining Estimate: 0h > > I've done a quick test with the block tree terms dictionary by replacing a > call to IndexOutput.writeBytes to write suffix bytes with a call to > LZ4.compressHC to test the peformance hit. Interestingly, search performance > was very good (see comparison table below) and the tim files were 14% smaller > (from 150432 bytes overall to 129516). > {noformat} > TaskQPS baseline StdDevQPS compressed StdDev > Pct diff > Fuzzy1 111.50 (2.0%) 78.78 (1.5%) > -29.4% ( -32% - -26%) > Fuzzy2 36.99 (2.7%) 28.59 (1.5%) > -22.7% ( -26% - -18%) > Respell 122.86 (2.1%) 103.89 (1.7%) > -15.4% ( -18% - -11%) > Wildcard 100.58 (4.3%) 94.42 (3.2%) > -6.1% ( -13% -1%) > Prefix3 124.90 (5.7%) 122.67 (4.7%) > -1.8% ( -11% -9%) >OrHighLow 169.87 (6.8%) 167.77 (8.0%) > -1.2% ( -15% - 14%) > LowTerm 1949.85 (4.5%) 1929.02 (3.4%) > -1.1% ( -8% -7%) > AndHighLow 2011.95 (3.5%) 1991.85 (3.3%) > -1.0% ( -7% -5%) > OrHighHigh 155.63 (6.7%) 154.12 (7.9%) > -1.0% ( -14% - 14%) > AndHighHigh 341.82 (1.2%) 339.49 (1.7%) > -0.7% ( -3% -2%) >OrHighMed 217.55 (6.3%) 216.16 (7.1%) > -0.6% ( -13% - 13%) > IntNRQ 53.10 (10.9%) 52.90 (8.6%) > -0.4% ( -17% - 21%) > MedTerm 998.11 (3.8%) 994.82 (5.6%) > -0.3% ( -9% -9%) > MedSpanNear 60.50 (3.7%) 60.36 (4.8%) > -0.2% ( -8% -8%) > HighSpanNear 19.74 (4.5%) 19.72 (5.1%) > -0.1% ( -9% -9%) > LowSpanNear 101.93 (3.2%) 101.82 (4.4%) > -0.1% ( -7% -7%) > AndHighMed 366.18 (1.7%) 366.93 (1.7%) > 0.2% ( -3% -3%) > PKLookup 237.28 (4.0%) 237.96 (4.2%) > 0.3% ( -7% -8%) >MedPhrase 173.17 (4.7%) 174.69 (4.7%) > 0.9% ( -8% - 10%) > LowSloppyPhrase 180.91 (2.6%) 182.79 (2.7%) > 1.0% ( -4% -6%) >LowPhrase 374.64 (5.5%) 379.11 (5.8%) > 1.2% ( -9% - 13%) > HighTerm 253.14 (7.9%) 256.97 (11.4%) > 1.5% ( -16% - 22%) > HighPhrase 19.52 (10.6%) 19.83 (11.0%) > 1.6% ( -18% - 25%) > MedSloppyPhrase 141.90 (2.6%) 144.11 (2.5%) > 1.6% ( -3% -6%) > HighSloppyPhrase 25.26 (4.8%) 25.97 (5.0%) > 2.8% ( -6% - 13%) > {noformat} > Only queries which are very terms-dictionary-intensive got a performance hit > (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved > (surprisingly) well. > Do you think of it as something worth exploring? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9146) Switch GitHub PR test from ant precommit to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023872#comment-17023872 ] Dawid Weiss commented on LUCENE-9146: - I think it can run both, at least at the beginning? > Switch GitHub PR test from ant precommit to gradle > -- > > Key: LUCENE-9146 > URL: https://issues.apache.org/jira/browse/LUCENE-9146 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Mike Drob >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9140) Clean up Solr dependencies to use transitives and explicit exclusions
[ https://issues.apache.org/jira/browse/LUCENE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss updated LUCENE-9140: Priority: Critical (was: Major) > Clean up Solr dependencies to use transitives and explicit exclusions > - > > Key: LUCENE-9140 > URL: https://issues.apache.org/jira/browse/LUCENE-9140 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Priority: Critical > > Many Solr dependencies in gradle are currently explicitly expanded into a > flat structure with { transitive = false }, reflecting ivy/ant build. > We should explicitly depend on what's really needed, allow for transitive > dependencies and exclude what's not required. This will make the dependency > graph clearer. We still have the warning check for creeping transitive > dependencies in the form of versions lock file and jar checksums. > A side effect would also be to figure out which scope dependencies belong to > (api level or internal). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9174) Bump default gradle memory to 2g
Dawid Weiss created LUCENE-9174: --- Summary: Bump default gradle memory to 2g Key: LUCENE-9174 URL: https://issues.apache.org/jira/browse/LUCENE-9174 Project: Lucene - Core Issue Type: Task Reporter: Dawid Weiss I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't know why it needs to much... {code} Expiring Daemon because JVM heap space is exhausted Daemon will be stopped at the end of the build after running out of JVM memory {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g
[ https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023875#comment-17023875 ] ASF subversion and git services commented on LUCENE-9174: - Commit 6f85ec04602aa083b4512667d37e36e0213b5c35 in lucene-solr's branch refs/heads/master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f85ec0 ] LUCENE-9174: Bump default gradle memory to 2g > Bump default gradle memory to 2g > > > Key: LUCENE-9174 > URL: https://issues.apache.org/jira/browse/LUCENE-9174 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Priority: Major > > I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't > know why it needs to much... > {code} > Expiring Daemon because JVM heap space is exhausted > Daemon will be stopped at the end of the build after running out of JVM memory > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9174) Bump default gradle memory to 2g
[ https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved LUCENE-9174. - Assignee: Dawid Weiss Resolution: Fixed > Bump default gradle memory to 2g > > > Key: LUCENE-9174 > URL: https://issues.apache.org/jira/browse/LUCENE-9174 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't > know why it needs to much... > {code} > Expiring Daemon because JVM heap space is exhausted > Daemon will be stopped at the end of the build after running out of JVM memory > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023881#comment-17023881 ] ASF subversion and git services commented on SOLR-11207: Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch refs/heads/gradle-master from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ] SOLR-11207: Add OWASP dependency checker to gradle build (#1121) * SOLR-11207: Add OWASP dependency checker to gradle build > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12930) Add developer documentation to source repo
[ https://issues.apache.org/jira/browse/SOLR-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023878#comment-17023878 ] ASF subversion and git services commented on SOLR-12930: Commit 74e88deba78ea40f81c3072d6e014903773f4e92 in lucene-solr's branch refs/heads/gradle-master from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74e88de ] Revert "SOLR-12930: move Gradle docs from ./help/ to new ./dev-docs/ directory" This reverts commit 2d8650d36cc65b3161f009be85fcfd2fa8ff637c. > Add developer documentation to source repo > -- > > Key: SOLR-12930 > URL: https://issues.apache.org/jira/browse/SOLR-12930 > Project: Solr > Issue Type: Improvement > Components: Tests >Reporter: Mark Miller >Priority: Major > Attachments: solr-dev-docs.zip > > Time Spent: 1h 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023882#comment-17023882 ] ASF subversion and git services commented on SOLR-11207: Commit 5ab59f59ac48c00c7f2047a92a5c7c0451490cf1 in lucene-solr's branch refs/heads/gradle-master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ab59f5 ] SOLR-11207: minor changes: - added 'owasp' task to the root project. This depends on dependencyCheckAggregate which seems to be a better fit for multi-module projects than dependencyCheckAnalyze (the difference is vague to me from plugin's documentation). - you can run the "gradlew owasp" task explicitly and it'll run the validation without any flags. - the owasp task is only added to check if validation.owasp property is true. I think this should stay as the default on non-CI systems (developer defaults) because it's a significant chunk of time it takes to download and validate dependencies. - I'm not sure *all* configurations should be included in the check... perhaps we should only limit ourselves to actual runtime dependencies not build dependencies, solr-ref-guide, etc. > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14214) Ref Guide: Clean up info about clients other than SolrJ
[ https://issues.apache.org/jira/browse/SOLR-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023877#comment-17023877 ] ASF subversion and git services commented on SOLR-14214: Commit ba77a5f2eb13ffb418b84dac1df957dc3e9e2247 in lucene-solr's branch refs/heads/gradle-master from Cassandra Targett [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ba77a5f ] SOLR-14214: Clean up client lists and references > Ref Guide: Clean up info about clients other than SolrJ > --- > > Key: SOLR-14214 > URL: https://issues.apache.org/jira/browse/SOLR-14214 > Project: Solr > Issue Type: Improvement > Components: documentation >Reporter: Cassandra Targett >Priority: Major > > The Ref Guide page client-api-lineup.adoc may have been updated at some point > since Nov 2011, the last time it says it was updated, but I would guess > probably not very recently. > It really would be worth going through the list to see which ones are still > active and removing those that would not work with modern versions of Solr > (say, 6.x or 7.x+?). > My personal POV is that all info on clients should be kept in the Wiki > (cwiki) and the Ref Guide merely link to that - that would allow client > maintainers to keep info about their clients up to date without needing to be > a committer in order to update the Ref Guide. > That approach would mean pretty much removing everything from the > client-api-lineup.adoc page, and also likely removing most if not all of the > other Client pages for Ruby, Python, and JS. > However it plays out, we should take a look at those pages and update > according to the current state of the client universe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g
[ https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023885#comment-17023885 ] ASF subversion and git services commented on LUCENE-9174: - Commit 6f85ec04602aa083b4512667d37e36e0213b5c35 in lucene-solr's branch refs/heads/gradle-master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f85ec0 ] LUCENE-9174: Bump default gradle memory to 2g > Bump default gradle memory to 2g > > > Key: LUCENE-9174 > URL: https://issues.apache.org/jira/browse/LUCENE-9174 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't > know why it needs to much... > {code} > Expiring Daemon because JVM heap space is exhausted > Daemon will be stopped at the end of the build after running out of JVM memory > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )
[ https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023879#comment-17023879 ] ASF subversion and git services commented on SOLR-13749: Commit 127ce3e360ad88cb0a77a58d81eb09df00c04045 in lucene-solr's branch refs/heads/gradle-master from Gus Heck [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=127ce3e ] SOLR-13749 adjust changes to reflect backport to 8.5 > Implement support for joining across collections with multiple shards ( XCJF ) > -- > > Key: SOLR-13749 > URL: https://issues.apache.org/jira/browse/SOLR-13749 > Project: Solr > Issue Type: New Feature >Reporter: Kevin Watters >Assignee: Gus Heck >Priority: Major > Fix For: 8.5 > > Time Spent: 1h 10m > Remaining Estimate: 0h > > This ticket includes 2 query parsers. > The first one is the "Cross collection join filter" (XCJF) parser. This is > the "Cross-collection join filter" query parser. It can do a call out to a > remote collection to get a set of join keys to be used as a filter against > the local collection. > The second one is the Hash Range query parser that you can specify a field > name and a hash range, the result is that only the documents that would have > hashed to that range will be returned. > This query parser will do an intersection based on join keys between 2 > collections. > The local collection is the collection that you are searching against. > The remote collection is the collection that contains the join keys that you > want to use as a filter. > Each shard participating in the distributed request will execute a query > against the remote collection. If the local collection is setup with the > compositeId router to be routed on the join key field, a hash range query is > applied to the remote collection query to only match the documents that > contain a potential match for the documents that are in the local shard/core. > > > Here's some vocab to help with the descriptions of the various parameters. > ||Term||Description|| > |Local Collection|This is the main collection that is being queried.| > |Remote Collection|This is the collection that the XCJFQuery will query to > resolve the join keys.| > |XCJFQuery|The lucene query that executes a search to get back a set of join > keys from a remote collection| > |HashRangeQuery|The lucene query that matches only the documents whose hash > code on a field falls within a specified range.| > > > ||Param ||Required ||Description|| > |collection|Required|The name of the external Solr collection to be queried > to retrieve the set of join key values ( required )| > |zkHost|Optional|The connection string to be used to connect to Zookeeper. > zkHost and solrUrl are both optional parameters, and at most one of them > should be specified. > If neither of zkHost or solrUrl are specified, the local Zookeeper cluster > will be used. ( optional )| > |solrUrl|Optional|The URL of the external Solr node to be queried ( optional > )| > |from|Required|The join key field name in the external collection ( required > )| > |to|Required|The join key field name in the local collection| > |v|See Note|The query to be executed against the external Solr collection to > retrieve the set of join key values. > Note: The original query can be passed at the end of the string or as the > "v" parameter. > It's recommended to use query parameter substitution with the "v" parameter > to ensure no issues arise with the default query parsers.| > |routed| |true / false. If true, the XCJF query will use each shard's hash > range to determine the set of join keys to retrieve for that shard. > This parameter improves the performance of the cross-collection join, but > it depends on the local collection being routed by the toField. If this > parameter is not specified, > the XCJF query will try to determine the correct value automatically.| > |ttl| |The length of time that an XCJF query in the cache will be considered > valid, in seconds. Defaults to 3600 (one hour). > The XCJF query will not be aware of changes to the remote collection, so > if the remote collection is updated, cached XCJF queries may give inaccurate > results. > After the ttl period has expired, the XCJF query will re-execute the join > against the remote collection.| > |_All others_| |Any normal Solr parameter can also be specified as a local > param.| > > Example Solr Config.xml changes: > > {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}} > {{ }}{{class}}{{=}}{{"solr.LRUCache"}} > {{ }}{{size}}{{=}}{{"128"}} > {{ }}{{initialSize}}{{=}}{{"0"}} > {{ }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}} > > {{<}}{{queryP
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023883#comment-17023883 ] ASF subversion and git services commented on SOLR-14189: Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch refs/heads/gradle-master from andywebb1975 [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ] SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172) > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request
[ https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023884#comment-17023884 ] ASF subversion and git services commented on SOLR-14189: Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch refs/heads/gradle-master from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ] SOLR-14189: Add changes entry > Some whitespace characters bypass zero-length test in query parsers leading > to 400 Bad Request > -- > > Key: SOLR-14189 > URL: https://issues.apache.org/jira/browse/SOLR-14189 > Project: Solr > Issue Type: Improvement > Components: query parsers >Reporter: Andy Webb >Assignee: Uwe Schindler >Priority: Major > Fix For: master (9.0), 8.5 > > Time Spent: 2h > Remaining Estimate: 0h > > The edismax and some other query parsers treat pure whitespace queries as > empty queries, but they use Java's {{String.trim()}} method to normalise > queries. That method [only treats characters 0-32 as > whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--]. > Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - > which bypass the test and lead to {{400 Bad Request}} responses - see for > example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs > {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the > exception: > {noformat} > org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at > line 1, column 0. Was expecting one of: ... "+" ... "-" ... > ... "(" ... "*" ... ... ... ... ... > ... "[" ... "{" ... ... "filter(" ... ... > ... > {noformat} > [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, > edismax and rerank query parsers to use > [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-] > which is aware of all whitespace characters. > Prior to the change, rerank behaves differently for U+3000 and U+0020 - with > the change, both the below give the "mandatory parameter" message: > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}} > - generic 400 Bad Request > {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}} > - 400 reporting "reRankQuery parameter is mandatory" -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023880#comment-17023880 ] ASF subversion and git services commented on SOLR-11207: Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch refs/heads/gradle-master from Jan Høydahl [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ] SOLR-11207: Add OWASP dependency checker to gradle build (#1121) * SOLR-11207: Add OWASP dependency checker to gradle build > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g
[ https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023895#comment-17023895 ] Robert Muir commented on LUCENE-9174: - The problem isn't the heap, the problem is the daemon. I've been running builds almost constantly for many days now (daemon disabled) with 1GB; no issue. > Bump default gradle memory to 2g > > > Key: LUCENE-9174 > URL: https://issues.apache.org/jira/browse/LUCENE-9174 > Project: Lucene - Core > Issue Type: Task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Major > > I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't > know why it needs to much... > {code} > Expiring Daemon because JVM heap space is exhausted > Daemon will be stopped at the end of the build after running out of JVM memory > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023900#comment-17023900 ] ASF subversion and git services commented on LUCENE-9134: - Commit c226207842e6237305cacf26bc0add6239a82aa3 in lucene-solr's branch refs/heads/gradle-master from Dawid Weiss [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c226207 ] LUCENE-9134: lucene:core:jflexStandardTokenizerImpl > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > > Time Spent: 5h 20m > Remaining Estimate: 0h > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9175) gradle build leaks tons of gradle-worker-classpath* files in tmpdir
Robert Muir created LUCENE-9175: --- Summary: gradle build leaks tons of gradle-worker-classpath* files in tmpdir Key: LUCENE-9175 URL: https://issues.apache.org/jira/browse/LUCENE-9175 Project: Lucene - Core Issue Type: Task Reporter: Robert Muir This may be a sign of classloader issues or similar that cause other issues like LUCENE-9174? {noformat} $ ls /tmp/gradle-worker-classpath* | wc -l 523 {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778 ] Tomoko Uchida edited comment on LUCENE-9123 at 1/26/20 6:47 PM: When reproducing this issue I noticed that JapaneseTokenizer (mode=search) gives positionIncrements=1 for the decompounded token "株式" instead of 0. This looks strange to me, is this an expected behaviour? If not, this may affect the synonyms handling? And please ignore my previous comment... I was mistsken about position increment. was (Author: tomoko uchida): When reproducing this issue I noticed that JapaneseTokenizer (mode=search) gives positionIncrements=1 for the decompounded token "株式" instead of 0. This looks strange to me, is this an expected behaviour? If not, this may affect the synonyms handling? > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778 ] Tomoko Uchida edited comment on LUCENE-9123 at 1/26/20 6:49 PM: When reproducing this issue I noticed that JapaneseTokenizer (mode=search) gives positionIncrements=1 for the decompounded token "株式" instead of 0. This looks strange to me, is this an expected behaviour? If not, this may affect the synonyms handling? please ignore above my comment... I was mistsken about position increment. was (Author: tomoko uchida): When reproducing this issue I noticed that JapaneseTokenizer (mode=search) gives positionIncrements=1 for the decompounded token "株式" instead of 0. This looks strange to me, is this an expected behaviour? If not, this may affect the synonyms handling? And please ignore my previous comment... I was mistsken about position increment. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13936) Schema/Config endpoints to modify configset with no core/collection
[ https://issues.apache.org/jira/browse/SOLR-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023904#comment-17023904 ] Ishan Chattopadhyaya commented on SOLR-13936: - [~apoorvprecisely], I've updated SIP document with your text. https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=139627548. Thanks for helping us out with the wording of the SIP. Can you please update the patch/PR with unit tests so that we can quickly review/commit? > Schema/Config endpoints to modify configset with no core/collection > --- > > Key: SOLR-13936 > URL: https://issues.apache.org/jira/browse/SOLR-13936 > Project: Solr > Issue Type: Sub-task > Components: config-api >Reporter: Apoorv Bhawsar >Priority: Minor > Time Spent: 1h 20m > Remaining Estimate: 0h > > All schema/config configurations should work even in cases where a collection > is not associated with them > This jira will involve > 1. Refactoring existing handler/manager to work without {{SolrCore}} > 2. Adding {{/api/cluster}} endpoints to support such modifications > Endpoints - > * {{/api/cluster/configset/\{name}/schema}} > * {{/ap/cluster/configset/\{name}/config}} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build
dweiss commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1199#issuecomment-578532163 This is great work, Erick and is very much appreciated. I do have a "but" though -- it's large and goes through a number of those tasks at once. I'm sorry I've been slow in taking in your patches. I can't really find a chunk of time large enough to review and correct certain issues in a large patch like this one. I'd really like to have minimalistic build fragments that only deal with one thing at a time. It's different from ant (and arguably different from how other projects structure gradle builds) but to me it makes reasoning about a particular build aspect simpler. Take jflex for example as it is really self-contained. You need to have access to jflex at a given version (no need to download anything -- you just declare a configuration and a dependency), you need a top-level task (so that it shows up in help) and you need to configure tasks that are attached to it in each project where we generate stuff from jflex files. I just committed an example that regenerates StandardTokenizerImpl in lucene/core - please take a look at the sources and see if it matches what I tried to express above. When you run "gradlew jflex" it'll recreate StandardTokenizerImpl.java... in fact when you run git diff you won't even see the difference because the regenerated file is identical to what it was before (which I think should be an ideal goal for now because we don't want to generate stuff other than ant does). The remaining jflex regeneration targets can be appended to this file, making it a clean, single-objective concern. When or if at some point somebody decides that a different way to deal with jflex files is more attractive (for example use an external plugin or move the custom task to buildSrc) those changes remain pretty much local to this file. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9166) gradle build: test failures need stacktraces
[ https://issues.apache.org/jira/browse/LUCENE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-9166: Attachment: LUCENE-9166.patch > gradle build: test failures need stacktraces > > > Key: LUCENE-9166 > URL: https://issues.apache.org/jira/browse/LUCENE-9166 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9166.patch > > > Test failures are missing the stacktrace. Worse yet, it tells you go to look > at a separate (very long) filename which also has no stacktrace :( > I know gradle tries really hard to be quiet and not say anything, but when a > test fails, that isn't the time or place :) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9166) gradle build: test failures need stacktraces
[ https://issues.apache.org/jira/browse/LUCENE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023906#comment-17023906 ] Robert Muir commented on LUCENE-9166: - Attached is a fix. Gradle has an inappropriate "stack trace filter" by default that is removing all of the stacktrace (especially if you hit exc say from a base test class, such as LuceneTestCase) before: {noformat} org.apache.lucene.TestDemo > classMethod FAILED java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'BOGUS' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene84, Asserting, CheapBastard, FastCompressingStoredFields, FastDecompressionCompressingStoredFields, HighCompressionCompressingStoredFields, DummyCompressingStoredFields, SimpleText] {noformat} after: {noformat} org.apache.lucene.TestDemo > classMethod FAILED java.lang.IllegalArgumentException: An SPI class of type org.apache.lucene.codecs.Codec with name 'BOGUS' does not exist. You need to add the corresponding JAR file supporting this SPI to your classpath. The current classpath supports the following names: [Lucene84, Asserting, CheapBastard, FastCompressingStoredFields, FastDecompressionCompressingStoredFields, HighCompressionCompressingStoredFields, DummyCompressingStoredFields, SimpleText] at __randomizedtesting.SeedInfo.seed([F03E3EEA39CA3E35]:0) at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116) at org.apache.lucene.codecs.Codec.forName(Codec.java:116) at org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:195) at org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:44) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53) at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47) at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64) at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54) at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) at com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370) at com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826) at java.base/java.lang.Thread.run(Thread.java:830) {noformat} > gradle build: test failures need stacktraces > > > Key: LUCENE-9166 > URL: https://issues.apache.org/jira/browse/LUCENE-9166 > Project: Lucene - Core > Issue Type: Bug >Reporter: Robert Muir >Priority: Major > Attachments: LUCENE-9166.patch > > > Test failures are missing the stacktrace. Worse yet, it tells you go to look > at a separate (very long) filename which also has no stacktrace :( > I know gradle tries really hard to be quiet and not say anything, but when a > test fails, that isn't the time or place :) -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14220) Unable to build 7_7 or 8_4 due to missing dependency
Karl Stoney created SOLR-14220: -- Summary: Unable to build 7_7 or 8_4 due to missing dependency Key: SOLR-14220 URL: https://issues.apache.org/jira/browse/SOLR-14220 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: Build Affects Versions: 8.4, 7.7 Reporter: Karl Stoney Attempting to build from: 7_7: https://github.com/apache/lucene-solr/commit/7a309c21ebbc1b08d9edf67802b63fc0bc7affcf or 8_4: https://github.com/apache/lucene-solr/commit/7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d Results in the same build failure: {code:java} BUILD FAILED /usr/local/autotrader/app/lucene-solr/solr/build.xml:685: The following error occurred while executing this line: /usr/local/autotrader/app/lucene-solr/solr/build.xml:656: The following error occurred while executing this line: /usr/local/autotrader/app/lucene-solr/lucene/common-build.xml:653: Error downloading wagon provider from the remote repository: Missing: -- 1) org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7 Try downloading the file manually from the project website. Then, install it using the command: mvn install:install-file -DgroupId=org.apache.maven.wagon -DartifactId=wagon-ssh -Dversion=1.0-beta-7 -Dpackaging=jar -Dfile=/path/to/file Alternatively, if you host your own repository you can deploy the file there: mvn deploy:deploy-file -DgroupId=org.apache.maven.wagon -DartifactId=wagon-ssh -Dversion=1.0-beta-7 -Dpackaging=jar -Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id] Path to dependency: 1) unspecified:unspecified:jar:0.0 2) org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7 -- 1 required artifact is missing. for artifact: unspecified:unspecified:jar:0.0 from the specified remote repositories: central (http://repo1.maven.org/maven2) {code} Previously building 7_7 from 3aad3311a97256a8537dd04165c67edcce1c153c, and 8_4 from c0b96fd305946b2564b967272e6e23c59ab0b5da worked fine. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev commented on SOLR-12325: - [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{{}} or {{$}}, we can recognize query case in: {{uniqueBlock(\{!v=type_s:parent})}} and {{uniqueBlock(\{!v=$type_param})}}. But it's not possible to distinguish {{uniqueBlock($field_or_q_param)}} nor handle {{uniqueBlock(\{! v=type_s:parent}). }}How everyone thinks about it?{{ }} > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:50 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{{!}} or {{$}}, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{{}} or {{$}}, we can recognize query case in: {{uniqueBlock(\{!v=type_s:parent})}} and {{uniqueBlock(\{!v=$type_param})}}. But it's not possible to distinguish {{uniqueBlock($field_or_q_param)}} nor handle {{uniqueBlock(\{! v=type_s:parent}). }}How everyone thinks about it?{{ }} > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:50 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{\{!}} or {{$}}, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{{!}} or {{$}}, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:51 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from curly brace or bucks, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from \{! or \$, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:51 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from \{! or \$, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from {{\{!}} or {{$}}, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:52 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from curly brace or bucks, we can recognize query case in: uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{\!v=$type_param\}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{\! v=type_s:parent\}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from curly brace or bucks, we can recognize query case in: uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
[ https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923 ] Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:52 PM: -- [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from curly brace or bucks, we can recognize query case in: uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? was (Author: mkhludnev): [~munendrasn] thanks for your feedback. I think it's worth to have tests covering absent fields, values, no matches and so one. I don't share concern about introducing new aggregation by suffixing existing one. I feel like proposal with introducing first arg enum is inconvenient for users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it will just confuse users. However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument start from curly brace or bucks, we can recognize query case in: uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? > introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet > -- > > Key: SOLR-12325 > URL: https://issues.apache.org/jira/browse/SOLR-12325 > Project: Solr > Issue Type: New Feature > Components: Facet Module >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Major > Fix For: 8.5 > > Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch > > Time Spent: 1.5h > Remaining Estimate: 0h > > It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin > query parsing method, don't invent your own. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit
[ https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023930#comment-17023930 ] Erick Erickson commented on SOLR-14202: --- It ought not be hard to alter the attached program to do two things: 1> index into whatever fields you really use, you can see that the ones I used are pretty generic. 2> use whatever component you think is valid that you're using. You mentioned suggester for instance. I believe Lucene checks on startup to see if there are segments that the segments_gen file does _not_ point to and deletes them, this is consistent with a searcher not being closed and with the files disappearing on restart rather than shut down. If you can share your conf directory maybe I'll have some time to run a test in parallel. > Old segments are not deleted after commit > - > > Key: SOLR-14202 > URL: https://issues.apache.org/jira/browse/SOLR-14202 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrCloud >Affects Versions: 8.4 >Reporter: Jörn Franke >Priority: Major > Attachments: eoe.zip > > > The data directory of a collection is growing and growing. It seems that old > segments are not deleted. They are only deleting during start of Solr. > How to reproduce. Have any collection (e.g. the example collection) and start > indexing documents. Even during the indexing the data directory is growing > significantly - much more than expected (several magnitudes). if certain > documents are updated (without significantly increasing the amount of data) > the index data directory grows again several magnitudes. Even for small > collections the needed space explodes. > This reduces significantly if Solr is stopped and then started. During > startup (not shutdown) Solr purges all those segments if not needed (* > sometimes some but not a significant amount is deleted during shutdown). This > is of course not a good workaround for normal operations. > It does not seem to have a affect on queries (their performance do not seem > to change). > The configs have not changed before the upgrade and after (e.g. from Solr 8.2 > to 8.3 to 8.4, not cross major versions), so I assume it could be related to > Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2. > > IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, > openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000). > Nevertheless, it did not happen in previous versions of Solr and the config > did not change. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023939#comment-17023939 ] Jan Høydahl commented on SOLR-11207: Thanks for the cleanup, a separate task 'owasp' and using the property to attach it to check makes sense! Closing this. > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries
[ https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl resolved SOLR-11207. Fix Version/s: 8.5 Resolution: Fixed > Add OWASP dependency checker to detect security vulnerabilities in third > party libraries > > > Key: SOLR-11207 > URL: https://issues.apache.org/jira/browse/SOLR-11207 > Project: Solr > Issue Type: Improvement > Components: Build >Affects Versions: 6.0 >Reporter: Hrishikesh Gadre >Assignee: Jan Høydahl >Priority: Major > Fix For: 8.5 > > Time Spent: 3h 20m > Remaining Estimate: 0h > > Lucene/Solr project depends on number of third party libraries. Some of those > libraries contain security vulnerabilities. Upgrading to versions of those > libraries that have fixes for those vulnerabilities is a simple, critical > step we can take to improve the security of the system. But for that we need > a tool which can scan the Lucene/Solr dependencies and look up the security > database for known vulnerabilities. > I found that [OWASP > dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/] > can be used for this purpose. It provides a ant task which we can include in > the Lucene/Solr build. We also need to figure out how (and when) to invoke > this dependency-checker. But this can be figured out once we complete the > first step of integrating this tool with the Lucene/Solr build system. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1199#issuecomment-578548576 Starting over with a new model This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023947#comment-17023947 ] Erick Erickson commented on LUCENE-9134: OK, thanks Dawid. I suspected some/all of what I've done so far would be throw-away while I got my feet wet with "the gradle way". Or maybe that's "Dawid's way" ;) And, for that matter, understood what the heck the ant stuff was doing. Humor aside, it's great that you're willing to lend some structure to the gradle effort, that helps keep things coherent rather than ad-hoc, with many different structures depending on who did which bit. I'll close the PR and start over with your model, now that I have an approach I'm _starting_ to see how they all fit together, and I can do these in smaller chunks. > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > > Time Spent: 5.5h > Remaining Estimate: 0h > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for -- > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] ErickErickson closed pull request #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build
ErickErickson closed pull request #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build URL: https://github.com/apache/lucene-solr/pull/1199 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on issue #564: prorated early termination
msokolov commented on issue #564: prorated early termination URL: https://github.com/apache/lucene-solr/pull/564#issuecomment-578549784 Abandoning as I plan to post a better alternative that achieves the same result without the random behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov closed pull request #564: prorated early termination
msokolov closed pull request #564: prorated early termination URL: https://github.com/apache/lucene-solr/pull/564 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-9134: --- Description: Take II about organizing this beast. A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. See process comments at parent (LUCENE-9077) * Implement jflex task in lucene/core * Implement jflex tasks in lucene/analysis * Implement javacc tasks in lucene/queryparser * Implement javacc tasks in solr/core * Implement python tasks in lucene (? there are several javadocs mentions in the build.xml, this may be irrelevant to the Gradle effort). * Implement python tasks in lucene/core * Implement python tasks in lucene/analysis * Here are the "regenerate" targets I found in the ant version. There are a couple that I don't have evidence for or against being rebuilt // Very top level {code:java} ./build.xml: ./build.xml: ./build.xml: {code} // top level Lucene. This includes the core/build.xml and test-framework/build.xml files {code:java} ./lucene/build.xml: ./lucene/build.xml: ./lucene/build.xml: {code} // This one has quite a number of customizations to {code:java} ./lucene/core/build.xml: {code} // This one has a bunch of code modifications _after_ javacc is run on certain of the // output files. Save this one for last? {code:java} ./lucene/queryparser/build.xml: {code} // the files under ../lucene/analysis... are pretty self contained. I expect these could be done as a unit {code:java} ./lucene/analysis/build.xml: ./lucene/analysis/build.xml: ./lucene/analysis/common/build.xml: ./lucene/analysis/icu/build.xml: ./lucene/analysis/kuromoji/build.xml: ./lucene/analysis/nori/build.xml: ./lucene/analysis/opennlp/build.xml: {code} // These _are_ regenerated from the top-level regenerate target, but for – LUCENE-9080//the changes were only in imports so there are no //corresponding files checked in in that JIRA {code:java} ./lucene/expressions/build.xml: {code} // Apparently unrelated to ./lucene/analysis/opennlp/build.xml "train-test-models" target // Apparently not rebuilt from the top level, but _are_ regenerated when executed from // ./solr/contrib/langid {code:java} ./solr/contrib/langid/build.xml: {code} was: Here are the "regenerate" targets I found in the ant version. There are a couple that I don't have evidence for or against being rebuilt // Very top level {code:java} ./build.xml: ./build.xml: ./build.xml: {code} // top level Lucene. This includes the core/build.xml and test-framework/build.xml files {code:java} ./lucene/build.xml: ./lucene/build.xml: ./lucene/build.xml: {code} // This one has quite a number of customizations to {code:java} ./lucene/core/build.xml: {code} // This one has a bunch of code modifications _after_ javacc is run on certain of the // output files. Save this one for last? {code:java} ./lucene/queryparser/build.xml: {code} // the files under ../lucene/analysis... are pretty self contained. I expect these could be done as a unit {code:java} ./lucene/analysis/build.xml: ./lucene/analysis/build.xml: ./lucene/analysis/common/build.xml: ./lucene/analysis/icu/build.xml: ./lucene/analysis/kuromoji/build.xml: ./lucene/analysis/nori/build.xml: ./lucene/analysis/opennlp/build.xml: {code} // These _are_ regenerated from the top-level regenerate target, but for -- LUCENE-9080//the changes were only in imports so there are no //corresponding files checked in in that JIRA {code:java} ./lucene/expressions/build.xml: {code} // Apparently unrelated to ./lucene/analysis/opennlp/build.xml "train-test-models" target // Apparently not rebuilt from the top level, but _are_ regenerated when executed from // ./solr/contrib/langid {code:java} ./solr/contrib/langid/build.xml: {code} > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Take II about organizing this beast. > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. See process comments > at parent (LUCENE-9077) > * Implement jflex task in lucene/core > * Implement jflex tasks in lucene/analysis > * Implement javacc tasks in lucene/queryparser > * Implement javacc tasks in solr/core > * Implement python tasks in lucene (? there are several javadocs mentions in > the build.xml, this may be irrelevant to the Gradle effort). > * Implement python tasks in lu
[jira] [Updated] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson updated LUCENE-9134: --- Description: Take II about organizing this beast. A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. See process comments at parent (LUCENE-9077) * Implement jflex task in lucene/core * Implement jflex tasks in lucene/analysis * Implement javacc tasks in lucene/queryparser (EOE) * Implement javacc tasks in solr/core (EOE) * Implement python tasks in lucene (? there are several javadocs mentions in the build.xml, this may be irrelevant to the Gradle effort). * Implement python tasks in lucene/core * Implement python tasks in lucene/analysis Here are the "regenerate" targets I found in the ant version. There are a couple that I don't have evidence for or against being rebuilt // Very top level {code:java} ./build.xml: ./build.xml: ./build.xml: {code} // top level Lucene. This includes the core/build.xml and test-framework/build.xml files {code:java} ./lucene/build.xml: ./lucene/build.xml: ./lucene/build.xml: {code} // This one has quite a number of customizations to {code:java} ./lucene/core/build.xml: {code} // This one has a bunch of code modifications _after_ javacc is run on certain of the // output files. Save this one for last? {code:java} ./lucene/queryparser/build.xml: {code} // the files under ../lucene/analysis... are pretty self contained. I expect these could be done as a unit {code:java} ./lucene/analysis/build.xml: ./lucene/analysis/build.xml: ./lucene/analysis/common/build.xml: ./lucene/analysis/icu/build.xml: ./lucene/analysis/kuromoji/build.xml: ./lucene/analysis/nori/build.xml: ./lucene/analysis/opennlp/build.xml: {code} // These _are_ regenerated from the top-level regenerate target, but for – LUCENE-9080//the changes were only in imports so there are no //corresponding files checked in in that JIRA {code:java} ./lucene/expressions/build.xml: {code} // Apparently unrelated to ./lucene/analysis/opennlp/build.xml "train-test-models" target // Apparently not rebuilt from the top level, but _are_ regenerated when executed from // ./solr/contrib/langid {code:java} ./solr/contrib/langid/build.xml: {code} was: Take II about organizing this beast. A list of items that needs to be added or requires work. If you'd like to work on any of these, please add your name to the list. See process comments at parent (LUCENE-9077) * Implement jflex task in lucene/core * Implement jflex tasks in lucene/analysis * Implement javacc tasks in lucene/queryparser * Implement javacc tasks in solr/core * Implement python tasks in lucene (? there are several javadocs mentions in the build.xml, this may be irrelevant to the Gradle effort). * Implement python tasks in lucene/core * Implement python tasks in lucene/analysis * Here are the "regenerate" targets I found in the ant version. There are a couple that I don't have evidence for or against being rebuilt // Very top level {code:java} ./build.xml: ./build.xml: ./build.xml: {code} // top level Lucene. This includes the core/build.xml and test-framework/build.xml files {code:java} ./lucene/build.xml: ./lucene/build.xml: ./lucene/build.xml: {code} // This one has quite a number of customizations to {code:java} ./lucene/core/build.xml: {code} // This one has a bunch of code modifications _after_ javacc is run on certain of the // output files. Save this one for last? {code:java} ./lucene/queryparser/build.xml: {code} // the files under ../lucene/analysis... are pretty self contained. I expect these could be done as a unit {code:java} ./lucene/analysis/build.xml: ./lucene/analysis/build.xml: ./lucene/analysis/common/build.xml: ./lucene/analysis/icu/build.xml: ./lucene/analysis/kuromoji/build.xml: ./lucene/analysis/nori/build.xml: ./lucene/analysis/opennlp/build.xml: {code} // These _are_ regenerated from the top-level regenerate target, but for – LUCENE-9080//the changes were only in imports so there are no //corresponding files checked in in that JIRA {code:java} ./lucene/expressions/build.xml: {code} // Apparently unrelated to ./lucene/analysis/opennlp/build.xml "train-test-models" target // Apparently not rebuilt from the top level, but _are_ regenerated when executed from // ./solr/contrib/langid {code:java} ./solr/contrib/langid/build.xml: {code} > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > >
[jira] [Comment Edited] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023947#comment-17023947 ] Erick Erickson edited comment on LUCENE-9134 at 1/26/20 10:59 PM: -- OK, thanks Dawid. I suspected some/all of what I've done so far would be throw-away while I got my feet wet with "the gradle way". Or maybe that's "Dawid's way" ;) And, for that matter, understood what the heck the ant stuff was doing. Humor aside, it's great that you're willing to lend some structure to the gradle effort, that helps keep things coherent rather than ad-hoc, with many different ways of doing something depending on who did which bit. I'll close the PR and start over with your model, now that I have an approach I'm _starting_ to see how they all fit together, and I can do these in smaller chunks. Should I put the deletes bits in? If so, my impulse would be to put it in the @TaskAction in JFlexTask. Regardless of whether I should, would that be the correct place for something like that? So is there anything to do with the jflex you put in for StandardTokenizerImpl except push it except verification or perhaps put the delete parts back? I'll look at javacc in the meantime. was (Author: erickerickson): OK, thanks Dawid. I suspected some/all of what I've done so far would be throw-away while I got my feet wet with "the gradle way". Or maybe that's "Dawid's way" ;) And, for that matter, understood what the heck the ant stuff was doing. Humor aside, it's great that you're willing to lend some structure to the gradle effort, that helps keep things coherent rather than ad-hoc, with many different structures depending on who did which bit. I'll close the PR and start over with your model, now that I have an approach I'm _starting_ to see how they all fit together, and I can do these in smaller chunks. > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Take II about organizing this beast. > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. See process comments > at parent (LUCENE-9077) > * Implement jflex task in lucene/core > * Implement jflex tasks in lucene/analysis > * Implement javacc tasks in lucene/queryparser (EOE) > * Implement javacc tasks in solr/core (EOE) > * Implement python tasks in lucene (? there are several javadocs mentions in > the build.xml, this may be irrelevant to the Gradle effort). > * Implement python tasks in lucene/core > * Implement python tasks in lucene/analysis > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for – > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from >
[jira] [Created] (SOLR-14221) Upgrade restlet
Jan Høydahl created SOLR-14221: -- Summary: Upgrade restlet Key: SOLR-14221 URL: https://issues.apache.org/jira/browse/SOLR-14221 Project: Solr Issue Type: Improvement Security Level: Public (Default Security Level. Issues are Public) Reporter: Jan Høydahl Upgrade restlet to latest version -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy opened a new pull request #1211: SOLR-14221: Upgrade restlet to version 2.4.0
janhoy opened a new pull request #1211: SOLR-14221: Upgrade restlet to version 2.4.0 URL: https://github.com/apache/lucene-solr/pull/1211 See https://issues.apache.org/jira/browse/SOLR-14221 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build
[ https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024057#comment-17024057 ] Erick Erickson commented on LUCENE-9134: While I'm looking at the javacc task, a looming question for a later task: lucene/util/automaton/createLevAutomata.py wants: "moman/finenight/python". We were getting it from: "https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";. What's the theory on how to have Gradle deal with it? > Port ant-regenerate tasks to Gradle build > - > > Key: LUCENE-9134 > URL: https://issues.apache.org/jira/browse/LUCENE-9134 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Erick Erickson >Assignee: Erick Erickson >Priority: Major > Attachments: LUCENE-9134.patch, core_regen.patch > > Time Spent: 5h 50m > Remaining Estimate: 0h > > Take II about organizing this beast. > A list of items that needs to be added or requires work. If you'd like to > work on any of these, please add your name to the list. See process comments > at parent (LUCENE-9077) > * Implement jflex task in lucene/core > * Implement jflex tasks in lucene/analysis > * Implement javacc tasks in lucene/queryparser (EOE) > * Implement javacc tasks in solr/core (EOE) > * Implement python tasks in lucene (? there are several javadocs mentions in > the build.xml, this may be irrelevant to the Gradle effort). > * Implement python tasks in lucene/core > * Implement python tasks in lucene/analysis > > Here are the "regenerate" targets I found in the ant version. There are a > couple that I don't have evidence for or against being rebuilt > // Very top level > {code:java} > ./build.xml: > ./build.xml: failonerror="true"> > ./build.xml: depends="regenerate,-check-after-regeneration"/> > {code} > // top level Lucene. This includes the core/build.xml and > test-framework/build.xml files > {code:java} > ./lucene/build.xml: > ./lucene/build.xml: inheritall="false"> > ./lucene/build.xml: > {code} > // This one has quite a number of customizations to > {code:java} > ./lucene/core/build.xml: depends="createLevAutomata,createPackedIntSources,jflex"/> > {code} > // This one has a bunch of code modifications _after_ javacc is run on > certain of the > // output files. Save this one for last? > {code:java} > ./lucene/queryparser/build.xml: > {code} > // the files under ../lucene/analysis... are pretty self contained. I expect > these could be done as a unit > {code:java} > ./lucene/analysis/build.xml: > ./lucene/analysis/build.xml: > ./lucene/analysis/common/build.xml: depends="jflex,unicode-data"/> > ./lucene/analysis/icu/build.xml: depends="gen-utr30-data-files,gennorm2,genrbbi"/> > ./lucene/analysis/kuromoji/build.xml: depends="build-dict"/> > ./lucene/analysis/nori/build.xml: depends="build-dict"/> > ./lucene/analysis/opennlp/build.xml: depends="train-test-models"/> > {code} > > // These _are_ regenerated from the top-level regenerate target, but for – > LUCENE-9080//the changes were only in imports so there are no > //corresponding files checked in in that JIRA > {code:java} > ./lucene/expressions/build.xml: depends="run-antlr"/> > {code} > // Apparently unrelated to ./lucene/analysis/opennlp/build.xml > "train-test-models" target > // Apparently not rebuilt from the top level, but _are_ regenerated when > executed from > // ./solr/contrib/langid > {code:java} > ./solr/contrib/langid/build.xml: depends="train-test-models"/> > {code} > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
[ https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024058#comment-17024058 ] Kazuaki Hiraga commented on LUCENE-9123: [~romseygeek] Thank you for your comments. I think we can modify the output of Kuromoji to deal with the issue at this moment, which the current GraphTokenStream cannot deal with decompounded tokens since we don't think we need to keep original tokens along with decompounded ones for many situations. So, we can introduce a new option to absorb originals for now. However, we think either SynonymGraphStream or ToeknStream should be able to deal with complex cases like you have mentioned in the future release of Lucene. [~tomoko], Thank you for your hard work! Please let me know if you have anything what I can help your testing or updating. And thank you for creating a ticket that points out the issue in SynonymGraphFilter: LUCENE-9173. > JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter > --- > > Key: LUCENE-9123 > URL: https://issues.apache.org/jira/browse/LUCENE-9123 > Project: Lucene - Core > Issue Type: Bug > Components: modules/analysis >Affects Versions: 8.4 >Reporter: Kazuaki Hiraga >Assignee: Tomoko Uchida >Priority: Major > Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch > > > JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with > both of SynonymGraphFilter and SynonymFilter when JT generates multiple > tokens as an output. If we use `mode=normal`, it should be fine. However, we > would like to use decomposed tokens that can maximize to chance to increase > recall. > Snippet of schema: > {code:xml} > positionIncrementGap="100" autoGeneratePhraseQueries="false"> > > > synonyms="lang/synonyms_ja.txt" > tokenizerFactory="solr.JapaneseTokenizerFactory"/> > > > tags="lang/stoptags_ja.txt" /> > > > > > > minimumLength="4"/> > > > > > {code} > An synonym entry that generates error: > {noformat} > 株式会社,コーポレーション > {noformat} > The following is an output on console: > {noformat} > $ ./bin/solr create_core -c jp_test -d ../config/solrconfs > ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] > Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 > (got: 0) > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14095) Replace Java serialization with Javabin in Overseer operations
[ https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024064#comment-17024064 ] Tomas Eduardo Fernandez Lobbe commented on SOLR-14095: -- Thanks for testing this Andy! I'll take a look at this tomorrow. Can you include the steps you did to reproduce this? Are you upgrading from Solr 8.4? or some older version? > Replace Java serialization with Javabin in Overseer operations > -- > > Key: SOLR-14095 > URL: https://issues.apache.org/jira/browse/SOLR-14095 > Project: Solr > Issue Type: Task >Reporter: Robert Muir >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-14095-json.patch, json-nl.patch > > Time Spent: 1h 20m > Remaining Estimate: 0h > > Removing the use of serialization is greatly preferred. > But if serialization over the wire must really happen, then we must use JDK's > serialization filtering capability to prevent havoc. > https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-578586761 I rebased off master since there were some upstream changes. Also resolved some getInstancePath callers though didn't actually remove it yet. I think it can move (not remain in SRL) to ZkSolrResourceLoader (until 9x, then remove) and a new StandaloneSolrResourceLoader. I think a StandaloneSolrResourceLoader is the next step at this point. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms
[ https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024107#comment-17024107 ] ASF subversion and git services commented on SOLR-13897: Commit 776631254ffa900527fa1ed7bcf789265cb289c1 in lucene-solr's branch refs/heads/master from Shalin Shekhar Mangar [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7766312 ] SOLR-13897: Fix unsafe publication of Terms object in ZkShardTerms that can cause visibility issues and race conditions under contention > Unsafe publication of Terms object in ZkShardTerms > -- > > Key: SOLR-13897 > URL: https://issues.apache.org/jira/browse/SOLR-13897 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 8.2, 8.3 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, > SOLR-13897.patch > > > The Terms object in ZkShardTerms is written using a write lock but reading is > allowed freely. This is not safe and can cause visibility issues and > associated race conditions under contention. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms
[ https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar updated SOLR-13897: - Status: Open (was: Patch Available) > Unsafe publication of Terms object in ZkShardTerms > -- > > Key: SOLR-13897 > URL: https://issues.apache.org/jira/browse/SOLR-13897 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 8.2, 8.3 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, > SOLR-13897.patch > > > The Terms object in ZkShardTerms is written using a write lock but reading is > allowed freely. This is not safe and can cause visibility issues and > associated race conditions under contention. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms
[ https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024108#comment-17024108 ] ASF subversion and git services commented on SOLR-13897: Commit 7316391d2dd77c486fa25b8435f0bcde33837a6d in lucene-solr's branch refs/heads/branch_8x from Shalin Shekhar Mangar [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7316391 ] SOLR-13897: Fix unsafe publication of Terms object in ZkShardTerms that can cause visibility issues and race conditions under contention (cherry picked from commit 776631254ffa900527fa1ed7bcf789265cb289c1) > Unsafe publication of Terms object in ZkShardTerms > -- > > Key: SOLR-13897 > URL: https://issues.apache.org/jira/browse/SOLR-13897 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 8.2, 8.3 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: master (9.0) > > Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, > SOLR-13897.patch > > > The Terms object in ZkShardTerms is written using a write lock but reading is > allowed freely. This is not safe and can cause visibility issues and > associated race conditions under contention. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms
[ https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shalin Shekhar Mangar resolved SOLR-13897. -- Fix Version/s: 8.5 Resolution: Fixed > Unsafe publication of Terms object in ZkShardTerms > -- > > Key: SOLR-13897 > URL: https://issues.apache.org/jira/browse/SOLR-13897 > Project: Solr > Issue Type: Bug > Components: SolrCloud >Affects Versions: 8.2, 8.3 >Reporter: Shalin Shekhar Mangar >Assignee: Shalin Shekhar Mangar >Priority: Major > Fix For: master (9.0), 8.5 > > Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, > SOLR-13897.patch > > > The Terms object in ZkShardTerms is written using a write lock but reading is > allowed freely. This is not safe and can cause visibility issues and > associated race conditions under contention. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9176) TestEstimatePointCount failure after changing number of indexed points
Ignacio Vera created LUCENE-9176: Summary: TestEstimatePointCount failure after changing number of indexed points Key: LUCENE-9176 URL: https://issues.apache.org/jira/browse/LUCENE-9176 Project: Lucene - Core Issue Type: Test Reporter: Ignacio Vera These tests can create now situations when there is only one leaf node. The tests do not handle this situation properly. {code:java} ant test -Dtestcase=TestLucene60PointsFormat -Dtests.method=testEstimatePointCount -Dtests.seed=A921F5ACFEF2F5B6 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ta-IN -Dtests.timezone=Asia/Kuala_Lumpur -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1 {code} {code:java} ant test -Dtestcase=TestLucene60PointsFormat -Dtests.method=testEstimatePointCount2Dims -Dtests.seed=99F4A087E8092D56 -Dtests.slow=true -Dtests.badapples=true -Dtests.locale=am-ET -Dtests.timezone=Asia/Calcutta -Dtests.asserts=true -Dtests.file.encoding=US-ASCII {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase opened a new pull request #1212: LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount
iverase opened a new pull request #1212: LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount URL: https://github.com/apache/lucene-solr/pull/1212 see https://issues.apache.org/jira/browse/LUCENE-9176 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored
[ https://issues.apache.org/jira/browse/SOLR-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Wislowski updated SOLR-14194: - Attachment: SOLR-14194.patch Status: Patch Available (was: Patch Available) > Allow Highlighting to work for indexes with uniqueKey that is not stored > > > Key: SOLR-14194 > URL: https://issues.apache.org/jira/browse/SOLR-14194 > Project: Solr > Issue Type: Improvement > Components: highlighter >Affects Versions: master (9.0) >Reporter: Andrzej Wislowski >Assignee: David Smiley >Priority: Minor > Labels: highlighter > Fix For: master (9.0) > > Attachments: SOLR-14194.patch, SOLR-14194.patch > > > Highlighting requires uniqueKey to be a stored field. I have changed > Highlighter allow returning results on indexes with uniqueKey that is a not > stored field, but saved as a docvalue type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored
[ https://issues.apache.org/jira/browse/SOLR-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Wislowski updated SOLR-14194: - Attachment: (was: SOLR-14194.patch) > Allow Highlighting to work for indexes with uniqueKey that is not stored > > > Key: SOLR-14194 > URL: https://issues.apache.org/jira/browse/SOLR-14194 > Project: Solr > Issue Type: Improvement > Components: highlighter >Affects Versions: master (9.0) >Reporter: Andrzej Wislowski >Assignee: David Smiley >Priority: Minor > Labels: highlighter > Fix For: master (9.0) > > Attachments: SOLR-14194.patch, SOLR-14194.patch > > > Highlighting requires uniqueKey to be a stored field. I have changed > Highlighter allow returning results on indexes with uniqueKey that is a not > stored field, but saved as a docvalue type. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org