date:20200126

[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core

2020-01-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023733#comment-17023733
 ] 

Uwe Schindler commented on SOLR-12045:
--

You commit seems to break precommit (interestingly only on Windows):

{noformat}
validate-source-patterns:
[source-patterns] Unescaped symbol "->" on line #43: 
solr/solr-ref-guide/src/analytics.adoc
[source-patterns] Unescaped symbol "->" on line #52: 
solr/solr-ref-guide/src/analytics.adoc
{noformat}

> Move Analytics Component from contrib to core
> -
>
> Key: SOLR-12045
> URL: https://issues.apache.org/jira/browse/SOLR-12045
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.0
>Reporter: Houston Putman
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12045.rb-visibility.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Analytics Component currently lives in contrib. Since it includes no 
> external dependencies, there is no harm in moving it into core solr.
> The analytics component would be included as a default search component and 
> the analytics handler (currently only used for analytics shard requests, 
> might be transitioned to handle user requests in the future) would be 
> included as an implicit handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss merged pull request #1121: SOLR-11207: Add OWASP dependency checker to gradle build

2020-01-26 Thread GitBox

dweiss merged pull request #1121: SOLR-11207: Add OWASP dependency checker to 
gradle build
URL: https://github.com/apache/lucene-solr/pull/1121
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023735#comment-17023735
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ]

SOLR-11207: Add OWASP dependency checker to gradle build (#1121)

* SOLR-11207: Add OWASP dependency checker to gradle build

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023736#comment-17023736
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch 
refs/heads/master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ]

SOLR-11207: Add OWASP dependency checker to gradle build (#1121)

* SOLR-11207: Add OWASP dependency checker to gradle build

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023743#comment-17023743
 ] 

Mikhail Khludnev commented on SOLR-12045:
-

I'm sorry. I'll fix it in a few hours.

> Move Analytics Component from contrib to core
> -
>
> Key: SOLR-12045
> URL: https://issues.apache.org/jira/browse/SOLR-12045
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.0
>Reporter: Houston Putman
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12045.rb-visibility.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Analytics Component currently lives in contrib. Since it includes no 
> external dependencies, there is no harm in moving it into core solr.
> The analytics component would be included as a default search component and 
> the analytics handler (currently only used for analytics shard requests, 
> might be transitioned to handle user requests in the future) would be 
> included as an implicit handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9173) SynonymGraphFilter doesn't correctly consume decompounded tokens (branched token graph)

2020-01-26 Thread Tomoko Uchida (Jira)

Tomoko Uchida created LUCENE-9173:
-

 Summary: SynonymGraphFilter doesn't correctly consume decompounded 
tokens  (branched token graph)
 Key: LUCENE-9173
 URL: https://issues.apache.org/jira/browse/LUCENE-9173
 Project: Lucene - Core
  Issue Type: Bug
  Components: modules/analysis
Reporter: Tomoko Uchida


This is a derived issue from LUCENE-9123.

When the tokenizer that is given to SynonymGraphFilter decompound tokens or 
emit multiple tokens at the same position, SynonymGraphFilter cannot correctly 
handle them (an exception will be thrown).

For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two 
decompounded tokens for the text "株式会社":
{code:java}
株式会社 (positionIncrement=0, positionLength=2)
株式 (positionIncrement=1, positionLength=1)
会社 (positionIncrement=1, positionLength=1)
{code}
Then if we give synonym "株式会社,コーポレーション" by SynonymGraphFilter (set 
tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown.
{code:java}
Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token 
(株式会社) with position increment != 1 (got: 0)
at 
org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
{code}
This isn't only limited to JapaneseTokenizer but a more general issue about 
handling branched token graph (decompounded tokens in the midstream).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023754#comment-17023754
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 5ab59f59ac48c00c7f2047a92a5c7c0451490cf1 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ab59f5 ]

SOLR-11207: minor changes:

- added 'owasp' task to the root project. This depends on
dependencyCheckAggregate which seems to be a better fit for multi-module
projects than dependencyCheckAnalyze (the difference is vague to me
from plugin's documentation).

- you can run the "gradlew owasp" task explicitly and it'll run the
validation without any flags.

- the owasp task is only added to check if validation.owasp property
is true. I think this should stay as the default on non-CI systems
(developer defaults) because it's a significant chunk of time it takes
to download and validate dependencies.

- I'm not sure *all* configurations should be included in the check...
perhaps we should only limit ourselves to actual runtime dependencies
 not build dependencies, solr-ref-guide, etc.


> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023757#comment-17023757
 ] 

Tomoko Uchida commented on LUCENE-9123:
---

I opened an issue for the SynonymGraphFilter: LUCENE-9173. Also I found an 
issue about multi-word synonyms LUCENE-8137, it seems like it's a different 
issue discussed here (but I'm not fully sure of that).

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12045) Move Analytics Component from contrib to core

2020-01-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023764#comment-17023764
 ] 

Uwe Schindler commented on SOLR-12045:
--

It was not your problem. The reason for the issue was a config change regarding 
line endings and then a bug in the source-patterns checker. It did not split 
lines correctly because the line splitter regex was broken ({{\n\r}} instead of 
the correct {{\r\n}}). Because you touched the file it was checked out again on 
Jenkins, suddenly having new line endings.

> Move Analytics Component from contrib to core
> -
>
> Key: SOLR-12045
> URL: https://issues.apache.org/jira/browse/SOLR-12045
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 8.0
>Reporter: Houston Putman
>Priority: Major
> Fix For: 8.1, master (9.0)
>
> Attachments: SOLR-12045.rb-visibility.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> The Analytics Component currently lives in contrib. Since it includes no 
> external dependencies, there is no harm in moving it into core solr.
> The analytics component would be included as a default search component and 
> the analytics handler (currently only used for analytics shard requests, 
> might be transitioned to handle user requests in the future) would be 
> included as an implicit handler.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023765#comment-17023765
 ] 

Uwe Schindler commented on SOLR-14189:
--

I will merge this now.

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-14189:


Assignee: Uwe Schindler

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-14189:
-
Fix Version/s: 8.5
   master (9.0)

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023766#comment-17023766
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch 
refs/heads/master from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler merged pull request #1172: SOLR-14189 switch from String.trim() to StringUtils.isBlank()

2020-01-26 Thread GitBox

uschindler merged pull request #1172: SOLR-14189 switch from String.trim() to 
StringUtils.isBlank()
URL: https://github.com/apache/lucene-solr/pull/1172
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023767#comment-17023767
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch 
refs/heads/master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023769#comment-17023769
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit e934c8a7caee42565bd4c3982e6b46a561ebecfe in lucene-solr's branch 
refs/heads/branch_8x from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e934c8a ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023768#comment-17023768
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit 43085edaa6954f212d1a7f19a2f60e3d0de73ae6 in lucene-solr's branch 
refs/heads/branch_8x from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=43085ed ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-14189:
-
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023770#comment-17023770
 ] 

Uwe Schindler commented on SOLR-14189:
--

Thanks Andy!

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778
 ] 

Tomoko Uchida commented on LUCENE-9123:
---

When reproducing this issue I noticed that JapaneseTokenizer (mode=search) 
gives positionIncrements=1 for the decompounded token "株式" instead of 0. This 
looks strange to me, is this an expected behaviour? If not, this may affect the 
synonyms handling?

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit

2020-01-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023785#comment-17023785
 ] 

Jörn Franke commented on SOLR-14202:


ok, thanks for the feedabck and details. I will check the attached program.

> Old segments are not deleted after commit
> -
>
> Key: SOLR-14202
> URL: https://issues.apache.org/jira/browse/SOLR-14202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4
>Reporter: Jörn Franke
>Priority: Major
> Attachments: eoe.zip
>
>
> The data directory of a collection is growing and growing. It seems that old 
> segments are not deleted. They are only deleting during start of Solr.
> How to reproduce. Have any collection (e.g. the example collection) and start 
> indexing documents. Even during the indexing the data directory is growing 
> significantly - much more than expected (several magnitudes). if certain 
> documents are updated (without significantly increasing the amount of data) 
> the index data directory grows again several magnitudes. Even for small 
> collections the needed space explodes.
> This reduces significantly if Solr is stopped and then started. During 
> startup (not shutdown) Solr purges all those segments if not needed (* 
> sometimes some but not a significant amount is deleted during shutdown). This 
> is of course not a good workaround for normal operations.
> It does not seem to have a affect on queries (their performance do not seem 
> to change).
> The configs have not changed before the upgrade and after (e.g. from Solr 8.2 
> to 8.3 to 8.4, not cross major versions), so I assume it could be related to 
> Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2.
>  
> IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, 
> openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000).
> Nevertheless, it did not happen in previous versions of Solr and the config 
> did not change.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14202) Old segments are not deleted after commit

2020-01-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023785#comment-17023785
 ] 

Jörn Franke edited comment on SOLR-14202 at 1/26/20 12:12 PM:
--

ok, thanks for the feedback and details. I will check the attached program.

 

I was suspecting teh FreeTextLookupFactory for the suggester, but I could in 
the end not verify it.

 

The strange thing is that the permissions/configurations etc. have not changed.


was (Author: jornfranke):
ok, thanks for the feedabck and details. I will check the attached program.

> Old segments are not deleted after commit
> -
>
> Key: SOLR-14202
> URL: https://issues.apache.org/jira/browse/SOLR-14202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4
>Reporter: Jörn Franke
>Priority: Major
> Attachments: eoe.zip
>
>
> The data directory of a collection is growing and growing. It seems that old 
> segments are not deleted. They are only deleting during start of Solr.
> How to reproduce. Have any collection (e.g. the example collection) and start 
> indexing documents. Even during the indexing the data directory is growing 
> significantly - much more than expected (several magnitudes). if certain 
> documents are updated (without significantly increasing the amount of data) 
> the index data directory grows again several magnitudes. Even for small 
> collections the needed space explodes.
> This reduces significantly if Solr is stopped and then started. During 
> startup (not shutdown) Solr purges all those segments if not needed (* 
> sometimes some but not a significant amount is deleted during shutdown). This 
> is of course not a good workaround for normal operations.
> It does not seem to have a affect on queries (their performance do not seem 
> to change).
> The configs have not changed before the upgrade and after (e.g. from Solr 8.2 
> to 8.3 to 8.4, not cross major versions), so I assume it could be related to 
> Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2.
>  
> IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, 
> openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000).
> Nevertheless, it did not happen in previous versions of Solr and the config 
> did not change.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit

2020-01-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023790#comment-17023790
 ] 

Jörn Franke commented on SOLR-14202:


The files are btw. deleted when Solr starts - not on shutdown

> Old segments are not deleted after commit
> -
>
> Key: SOLR-14202
> URL: https://issues.apache.org/jira/browse/SOLR-14202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4
>Reporter: Jörn Franke
>Priority: Major
> Attachments: eoe.zip
>
>
> The data directory of a collection is growing and growing. It seems that old 
> segments are not deleted. They are only deleting during start of Solr.
> How to reproduce. Have any collection (e.g. the example collection) and start 
> indexing documents. Even during the indexing the data directory is growing 
> significantly - much more than expected (several magnitudes). if certain 
> documents are updated (without significantly increasing the amount of data) 
> the index data directory grows again several magnitudes. Even for small 
> collections the needed space explodes.
> This reduces significantly if Solr is stopped and then started. During 
> startup (not shutdown) Solr purges all those segments if not needed (* 
> sometimes some but not a significant amount is deleted during shutdown). This 
> is of course not a good workaround for normal operations.
> It does not seem to have a affect on queries (their performance do not seem 
> to change).
> The configs have not changed before the upgrade and after (e.g. from Solr 8.2 
> to 8.3 to 8.4, not cross major versions), so I assume it could be related to 
> Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2.
>  
> IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, 
> openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000).
> Nevertheless, it did not happen in previous versions of Solr and the config 
> did not change.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread Andy Webb (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023821#comment-17023821
 ] 

Andy Webb commented on SOLR-14189:
--

Happy to help - thanks Uwe (and Christine)!

> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed

2020-01-26 Thread Andy Webb (Jira)

Andy Webb created SOLR-14219:


 Summary: OverseerSolrResponse's serialVersionUID has changed
 Key: SOLR-14219
 URL: https://issues.apache.org/jira/browse/SOLR-14219
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Reporter: Andy Webb


When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is 
used, the serialized OverseerSolrResponse has a different serialVersionUID to 
earlier versions, making it backwards-incompatible.

(PR incoming)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023863#comment-17023863
 ] 

Tomoko Uchida commented on LUCENE-9123:
---

Thanks [~h.kazuaki] for updating the patches. +1, I will commit them with 
CHANGES and MIGRATE entries next weekend or so  (sorry for the delay, I may not 
have time to test them locally right now). Meanwhile can you tell us the e-mail 
address that will be logged as the author of the patch?

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] andywebb1975 opened a new pull request #1210: SOLR-14219 force serialVersionUID of OverseerSolrResponse

2020-01-26 Thread GitBox

andywebb1975 opened a new pull request #1210: SOLR-14219 force serialVersionUID 
of OverseerSolrResponse
URL: https://github.com/apache/lucene-solr/pull/1210
 
 
   # Description
   
   When the useUnsafeOverseerResponse=true option introduced in SOLR-14095 is 
used, the serialized OverseerSolrResponse has a different serialVersionUID to 
earlier versions, making it backwards-incompatible.
   
   # Solution
   
   This PR forces the serialVersionUID of OverseerSolrResponse to be the same 
value as before.
   
   # Tests
   
   Tested in a prototype environment with a mixed 8.4.1 and master-branch Solr 
nodes.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed

2020-01-26 Thread Andy Webb (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Webb updated SOLR-14219:
-
Description: 
When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is 
used, the serialized OverseerSolrResponse has a different serialVersionUID to 
earlier versions, making it backwards-incompatible.

https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID to 
its old value, so old and new nodes become compatible.

  was:
When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 is 
used, the serialized OverseerSolrResponse has a different serialVersionUID to 
earlier versions, making it backwards-incompatible.

(PR incoming)


> OverseerSolrResponse's serialVersionUID has changed
> ---
>
> Key: SOLR-14219
> URL: https://issues.apache.org/jira/browse/SOLR-14219
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Andy Webb
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 
> is used, the serialized OverseerSolrResponse has a different serialVersionUID 
> to earlier versions, making it backwards-incompatible.
> https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID 
> to its old value, so old and new nodes become compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14095) Replace Java serialization with Javabin in Overseer operations

2020-01-26 Thread Andy Webb (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023866#comment-17023866
 ] 

Andy Webb commented on SOLR-14095:
--

hi, I've been experimenting with upgrading to 8.5.0+ in a prototyping 
environment using {{useUnsafeOverseerResponse=true}}  and have found that a 
mixed pool of older/newer nodes gives the exception 
{{java.io.InvalidClassException: org.apache.solr.cloud.OverseerSolrResponse; 
local class incompatible: stream classdesc serialVersionUID = 
4721653044098960880, local class serialVersionUID = -3791204262816422245}} (or 
vice-versa, depending on which node is the overseer). I've attached a PR to 
SOLR-14219 which I've found resolved this issue - please would someone review 
this?

thanks,
Andy

> Replace Java serialization with Javabin in Overseer operations
> --
>
> Key: SOLR-14095
> URL: https://issues.apache.org/jira/browse/SOLR-14095
> Project: Solr
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-14095-json.patch, json-nl.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Removing the use of serialization is greatly preferred.
> But if serialization over the wire must really happen, then we must use JDK's 
> serialization filtering capability to prevent havoc.
> https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14219) OverseerSolrResponse's serialVersionUID has changed

2020-01-26 Thread Andy Webb (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Webb updated SOLR-14219:
-
Status: Patch Available  (was: Open)

> OverseerSolrResponse's serialVersionUID has changed
> ---
>
> Key: SOLR-14219
> URL: https://issues.apache.org/jira/browse/SOLR-14219
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Reporter: Andy Webb
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> When the {{useUnsafeOverseerResponse=true}} option introduced in SOLR-14095 
> is used, the serialized OverseerSolrResponse has a different serialVersionUID 
> to earlier versions, making it backwards-incompatible.
> https://github.com/apache/lucene-solr/pull/1210 forces the serialVersionUID 
> to its old value, so old and new nodes become compatible.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9173) SynonymGraphFilter doesn't correctly consume decompounded tokens (branched token graph)

2020-01-26 Thread Tomoko Uchida (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomoko Uchida updated LUCENE-9173:
--
Description: 
This is a derived issue from LUCENE-9123.

When the tokenizer that is given to SynonymGraphFilter decompound tokens or 
emit multiple tokens at the same position, SynonymGraphFilter cannot correctly 
handle them (an exception will be thrown).

For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two 
decompounded tokens for the text "株式会社":
{code:java}
株式会社 (positionIncrement=0, positionLength=2)
株式 (positionIncrement=1, positionLength=1)
会社 (positionIncrement=1, positionLength=1)
{code}
Then if we give a synonym "株式会社,コーポレーション" by SynonymGraphFilterFactory (set 
tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown.
{code:java}
Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token 
(株式会社) with position increment != 1 (got: 0)
at 
org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
{code}
This isn't only limited to JapaneseTokenizer but a more general issue about 
handling branched token graph (decompounded tokens in the midstream).

  was:
This is a derived issue from LUCENE-9123.

When the tokenizer that is given to SynonymGraphFilter decompound tokens or 
emit multiple tokens at the same position, SynonymGraphFilter cannot correctly 
handle them (an exception will be thrown).

For example, JapaneseTokenizer (mode=SEARCH) would emit a token and two 
decompounded tokens for the text "株式会社":
{code:java}
株式会社 (positionIncrement=0, positionLength=2)
株式 (positionIncrement=1, positionLength=1)
会社 (positionIncrement=1, positionLength=1)
{code}
Then if we give synonym "株式会社,コーポレーション" by SynonymGraphFilter (set 
tokenizerFactory=JapaneseTokenizerFactory) this exception is thrown.
{code:java}
Caused by: java.lang.IllegalArgumentException: term: 株式会社 analyzed to a token 
(株式会社) with position increment != 1 (got: 0)
at 
org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:325)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.loadSynonyms(SynonymGraphFilterFactory.java:179)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
at 
org.apache.lucene.analysis.synonym.SynonymGraphFilterFactory.inform(SynonymGraphFilterFactory.java:154)
 ~[lucene-analyzers-common-8.4.0.jar:8.4.0 
bc02ab906445fcf4e297f4ef00ab4a54fdd72ca2 - jpountz - 2019-12-19 20:16:38]
{code}
This isn't only limited to JapaneseTokenizer but a more general issue about 
handling branched token graph (decompounded tokens in the midstream).


> SynonymGraphFilter doesn't correctly consume decompounded tokens  (branched 
> token graph)
> 
>
> Key: LUCENE-9173
> URL: https://issues.apache.org/jira/browse/LUCENE-9173
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Reporter: Tomoko Uchida
>Priority: Minor
>
> This is a derived issue from LUCENE-9123.
> When the tokenizer that is given to SynonymGraphFilter decompound tokens or 
> emit multiple tokens at the same position, Synonym

[jira] [Comment Edited] (LUCENE-4702) Terms dictionary compression

2020-01-26 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023871#comment-17023871
 ] 

Adrien Grand edited comment on LUCENE-4702 at 1/26/20 5:09 PM:
---

Nightly benchmarks are seeing worse slowdowns than what I was observing locally:
 - slower indexing 
http://people.apache.org/~mikemccand/lucenebench/indexing.html
 - slower fuzzy queries 
http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html
 - slower wildcard queries 
http://people.apache.org/~mikemccand/lucenebench/Wildcard.html
 - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html

Interestingly PK lookups got faster ( ! ) 
http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix 
queries are only barely slower 
http://people.apache.org/~mikemccand/lucenebench/Prefix3.html.

I'll look into it.


was (Author: jpountz):
Nightly benchmarks are seeing worse slowdowns than what I was observing locally:
 - slower indexing 
http://people.apache.org/~mikemccand/lucenebench/indexing.html
 - slower fuzzy queries 
http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html
 - slower wildcard queries 
http://people.apache.org/~mikemccand/lucenebench/Wildcard.html
 - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html

Interestingly PK lookups got faster (!) 
http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix 
queries are only barely slower 
http://people.apache.org/~mikemccand/lucenebench/Prefix3.html.

I'll look into it.

> Terms dictionary compression
> 
>
> Key: LUCENE-4702
> URL: https://issues.apache.org/jira/browse/LUCENE-4702
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-4702.patch, LUCENE-4702.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I've done a quick test with the block tree terms dictionary by replacing a 
> call to IndexOutput.writeBytes to write suffix bytes with a call to 
> LZ4.compressHC to test the peformance hit. Interestingly, search performance 
> was very good (see comparison table below) and the tim files were 14% smaller 
> (from 150432 bytes overall to 129516).
> {noformat}
> TaskQPS baseline  StdDevQPS compressed  StdDev
> Pct diff
>   Fuzzy1  111.50  (2.0%)   78.78  (1.5%)  
> -29.4% ( -32% -  -26%)
>   Fuzzy2   36.99  (2.7%)   28.59  (1.5%)  
> -22.7% ( -26% -  -18%)
>  Respell  122.86  (2.1%)  103.89  (1.7%)  
> -15.4% ( -18% -  -11%)
> Wildcard  100.58  (4.3%)   94.42  (3.2%)   
> -6.1% ( -13% -1%)
>  Prefix3  124.90  (5.7%)  122.67  (4.7%)   
> -1.8% ( -11% -9%)
>OrHighLow  169.87  (6.8%)  167.77  (8.0%)   
> -1.2% ( -15% -   14%)
>  LowTerm 1949.85  (4.5%) 1929.02  (3.4%)   
> -1.1% (  -8% -7%)
>   AndHighLow 2011.95  (3.5%) 1991.85  (3.3%)   
> -1.0% (  -7% -5%)
>   OrHighHigh  155.63  (6.7%)  154.12  (7.9%)   
> -1.0% ( -14% -   14%)
>  AndHighHigh  341.82  (1.2%)  339.49  (1.7%)   
> -0.7% (  -3% -2%)
>OrHighMed  217.55  (6.3%)  216.16  (7.1%)   
> -0.6% ( -13% -   13%)
>   IntNRQ   53.10 (10.9%)   52.90  (8.6%)   
> -0.4% ( -17% -   21%)
>  MedTerm  998.11  (3.8%)  994.82  (5.6%)   
> -0.3% (  -9% -9%)
>  MedSpanNear   60.50  (3.7%)   60.36  (4.8%)   
> -0.2% (  -8% -8%)
> HighSpanNear   19.74  (4.5%)   19.72  (5.1%)   
> -0.1% (  -9% -9%)
>  LowSpanNear  101.93  (3.2%)  101.82  (4.4%)   
> -0.1% (  -7% -7%)
>   AndHighMed  366.18  (1.7%)  366.93  (1.7%)
> 0.2% (  -3% -3%)
> PKLookup  237.28  (4.0%)  237.96  (4.2%)
> 0.3% (  -7% -8%)
>MedPhrase  173.17  (4.7%)  174.69  (4.7%)
> 0.9% (  -8% -   10%)
>  LowSloppyPhrase  180.91  (2.6%)  182.79  (2.7%)
> 1.0% (  -4% -6%)
>LowPhrase  374.64  (5.5%)  379.11  (5.8%)
> 1.2% (  -9% -   13%)
> HighTerm  253.14  (7.9%)  256.97 (11.4%)
> 1.5% ( -16% -   22%)
>   HighPhrase   19.52 (10.6%)   19.83 (11.0%)
> 1.6% ( -18% -   25%)
>  MedSloppyPhrase  141.90  (2.6%)  144.11  (2.5%)
> 1.6% (

[jira] [Commented] (LUCENE-4702) Terms dictionary compression

2020-01-26 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023871#comment-17023871
 ] 

Adrien Grand commented on LUCENE-4702:
--

Nightly benchmarks are seeing worse slowdowns than what I was observing locally:
 - slower indexing 
http://people.apache.org/~mikemccand/lucenebench/indexing.html
 - slower fuzzy queries 
http://people.apache.org/~mikemccand/lucenebench/Fuzzy1.html
 - slower wildcard queries 
http://people.apache.org/~mikemccand/lucenebench/Wildcard.html
 - slower respell http://people.apache.org/~mikemccand/lucenebench/Respell.html

Interestingly PK lookups got faster (!) 
http://people.apache.org/~mikemccand/lucenebench/PKLookup.html and prefix 
queries are only barely slower 
http://people.apache.org/~mikemccand/lucenebench/Prefix3.html.

I'll look into it.

> Terms dictionary compression
> 
>
> Key: LUCENE-4702
> URL: https://issues.apache.org/jira/browse/LUCENE-4702
> Project: Lucene - Core
>  Issue Type: Wish
>Reporter: Adrien Grand
>Assignee: Adrien Grand
>Priority: Trivial
> Attachments: LUCENE-4702.patch, LUCENE-4702.patch
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> I've done a quick test with the block tree terms dictionary by replacing a 
> call to IndexOutput.writeBytes to write suffix bytes with a call to 
> LZ4.compressHC to test the peformance hit. Interestingly, search performance 
> was very good (see comparison table below) and the tim files were 14% smaller 
> (from 150432 bytes overall to 129516).
> {noformat}
> TaskQPS baseline  StdDevQPS compressed  StdDev
> Pct diff
>   Fuzzy1  111.50  (2.0%)   78.78  (1.5%)  
> -29.4% ( -32% -  -26%)
>   Fuzzy2   36.99  (2.7%)   28.59  (1.5%)  
> -22.7% ( -26% -  -18%)
>  Respell  122.86  (2.1%)  103.89  (1.7%)  
> -15.4% ( -18% -  -11%)
> Wildcard  100.58  (4.3%)   94.42  (3.2%)   
> -6.1% ( -13% -1%)
>  Prefix3  124.90  (5.7%)  122.67  (4.7%)   
> -1.8% ( -11% -9%)
>OrHighLow  169.87  (6.8%)  167.77  (8.0%)   
> -1.2% ( -15% -   14%)
>  LowTerm 1949.85  (4.5%) 1929.02  (3.4%)   
> -1.1% (  -8% -7%)
>   AndHighLow 2011.95  (3.5%) 1991.85  (3.3%)   
> -1.0% (  -7% -5%)
>   OrHighHigh  155.63  (6.7%)  154.12  (7.9%)   
> -1.0% ( -14% -   14%)
>  AndHighHigh  341.82  (1.2%)  339.49  (1.7%)   
> -0.7% (  -3% -2%)
>OrHighMed  217.55  (6.3%)  216.16  (7.1%)   
> -0.6% ( -13% -   13%)
>   IntNRQ   53.10 (10.9%)   52.90  (8.6%)   
> -0.4% ( -17% -   21%)
>  MedTerm  998.11  (3.8%)  994.82  (5.6%)   
> -0.3% (  -9% -9%)
>  MedSpanNear   60.50  (3.7%)   60.36  (4.8%)   
> -0.2% (  -8% -8%)
> HighSpanNear   19.74  (4.5%)   19.72  (5.1%)   
> -0.1% (  -9% -9%)
>  LowSpanNear  101.93  (3.2%)  101.82  (4.4%)   
> -0.1% (  -7% -7%)
>   AndHighMed  366.18  (1.7%)  366.93  (1.7%)
> 0.2% (  -3% -3%)
> PKLookup  237.28  (4.0%)  237.96  (4.2%)
> 0.3% (  -7% -8%)
>MedPhrase  173.17  (4.7%)  174.69  (4.7%)
> 0.9% (  -8% -   10%)
>  LowSloppyPhrase  180.91  (2.6%)  182.79  (2.7%)
> 1.0% (  -4% -6%)
>LowPhrase  374.64  (5.5%)  379.11  (5.8%)
> 1.2% (  -9% -   13%)
> HighTerm  253.14  (7.9%)  256.97 (11.4%)
> 1.5% ( -16% -   22%)
>   HighPhrase   19.52 (10.6%)   19.83 (11.0%)
> 1.6% ( -18% -   25%)
>  MedSloppyPhrase  141.90  (2.6%)  144.11  (2.5%)
> 1.6% (  -3% -6%)
> HighSloppyPhrase   25.26  (4.8%)   25.97  (5.0%)
> 2.8% (  -6% -   13%)
> {noformat}
> Only queries which are very terms-dictionary-intensive got a performance hit 
> (Fuzzy, Fuzzy2, Respell, Wildcard), other queries including Prefix3 behaved 
> (surprisingly) well.
> Do you think of it as something worth exploring?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9146) Switch GitHub PR test from ant precommit to gradle

2020-01-26 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023872#comment-17023872
 ] 

Dawid Weiss commented on LUCENE-9146:
-

I think it can run both, at least at the beginning?

> Switch GitHub PR test from ant precommit to gradle
> --
>
> Key: LUCENE-9146
> URL: https://issues.apache.org/jira/browse/LUCENE-9146
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Mike Drob
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9140) Clean up Solr dependencies to use transitives and explicit exclusions

2020-01-26 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated LUCENE-9140:

Priority: Critical  (was: Major)

> Clean up Solr dependencies to use transitives and explicit exclusions
> -
>
> Key: LUCENE-9140
> URL: https://issues.apache.org/jira/browse/LUCENE-9140
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Priority: Critical
>
> Many Solr dependencies in gradle are currently explicitly expanded into a 
> flat structure with { transitive = false }, reflecting ivy/ant build. 
> We should explicitly depend on what's really needed, allow for transitive 
> dependencies and exclude what's not required. This will make the dependency 
> graph clearer. We still have the warning check for creeping transitive 
> dependencies in the form of versions lock file and jar checksums.
> A side effect would also be to figure out which scope dependencies belong to 
> (api level or internal).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9174) Bump default gradle memory to 2g

2020-01-26 Thread Dawid Weiss (Jira)

Dawid Weiss created LUCENE-9174:
---

 Summary: Bump default gradle memory to 2g
 Key: LUCENE-9174
 URL: https://issues.apache.org/jira/browse/LUCENE-9174
 Project: Lucene - Core
  Issue Type: Task
Reporter: Dawid Weiss


I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't 
know why it needs to much...
{code}
Expiring Daemon because JVM heap space is exhausted
Daemon will be stopped at the end of the build after running out of JVM memory
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023875#comment-17023875
 ] 

ASF subversion and git services commented on LUCENE-9174:
-

Commit 6f85ec04602aa083b4512667d37e36e0213b5c35 in lucene-solr's branch 
refs/heads/master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f85ec0 ]

LUCENE-9174: Bump default gradle memory to 2g


> Bump default gradle memory to 2g
> 
>
> Key: LUCENE-9174
> URL: https://issues.apache.org/jira/browse/LUCENE-9174
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Priority: Major
>
> I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't 
> know why it needs to much...
> {code}
> Expiring Daemon because JVM heap space is exhausted
> Daemon will be stopped at the end of the build after running out of JVM memory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9174) Bump default gradle memory to 2g

2020-01-26 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved LUCENE-9174.
-
  Assignee: Dawid Weiss
Resolution: Fixed

> Bump default gradle memory to 2g
> 
>
> Key: LUCENE-9174
> URL: https://issues.apache.org/jira/browse/LUCENE-9174
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>
> I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't 
> know why it needs to much...
> {code}
> Expiring Daemon because JVM heap space is exhausted
> Daemon will be stopped at the end of the build after running out of JVM memory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023881#comment-17023881
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch 
refs/heads/gradle-master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ]

SOLR-11207: Add OWASP dependency checker to gradle build (#1121)

* SOLR-11207: Add OWASP dependency checker to gradle build

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12930) Add developer documentation to source repo

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023878#comment-17023878
 ] 

ASF subversion and git services commented on SOLR-12930:


Commit 74e88deba78ea40f81c3072d6e014903773f4e92 in lucene-solr's branch 
refs/heads/gradle-master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74e88de ]

Revert "SOLR-12930: move Gradle docs from ./help/ to new ./dev-docs/ directory"

This reverts commit 2d8650d36cc65b3161f009be85fcfd2fa8ff637c.


> Add developer documentation to source repo
> --
>
> Key: SOLR-12930
> URL: https://issues.apache.org/jira/browse/SOLR-12930
> Project: Solr
>  Issue Type: Improvement
>  Components: Tests
>Reporter: Mark Miller
>Priority: Major
> Attachments: solr-dev-docs.zip
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023882#comment-17023882
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 5ab59f59ac48c00c7f2047a92a5c7c0451490cf1 in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5ab59f5 ]

SOLR-11207: minor changes:

- added 'owasp' task to the root project. This depends on
dependencyCheckAggregate which seems to be a better fit for multi-module
projects than dependencyCheckAnalyze (the difference is vague to me
from plugin's documentation).

- you can run the "gradlew owasp" task explicitly and it'll run the
validation without any flags.

- the owasp task is only added to check if validation.owasp property
is true. I think this should stay as the default on non-CI systems
(developer defaults) because it's a significant chunk of time it takes
to download and validate dependencies.

- I'm not sure *all* configurations should be included in the check...
perhaps we should only limit ourselves to actual runtime dependencies
 not build dependencies, solr-ref-guide, etc.


> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14214) Ref Guide: Clean up info about clients other than SolrJ

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14214?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023877#comment-17023877
 ] 

ASF subversion and git services commented on SOLR-14214:


Commit ba77a5f2eb13ffb418b84dac1df957dc3e9e2247 in lucene-solr's branch 
refs/heads/gradle-master from Cassandra Targett
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ba77a5f ]

SOLR-14214: Clean up client lists and references


> Ref Guide: Clean up info about clients other than SolrJ
> ---
>
> Key: SOLR-14214
> URL: https://issues.apache.org/jira/browse/SOLR-14214
> Project: Solr
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cassandra Targett
>Priority: Major
>
> The Ref Guide page client-api-lineup.adoc may have been updated at some point 
> since Nov 2011, the last time it says it was updated, but I would guess 
> probably not very recently.
> It really would be worth going through the list to see which ones are still 
> active and removing those that would not work with modern versions of Solr 
> (say, 6.x or 7.x+?).
> My personal POV is that all info on clients should be kept in the Wiki 
> (cwiki) and the Ref Guide merely link to that - that would allow client 
> maintainers to keep info about their clients up to date without needing to be 
> a committer in order to update the Ref Guide.
> That approach would mean pretty much removing everything from the 
> client-api-lineup.adoc page, and also likely removing most if not all of the 
> other Client pages for Ruby, Python, and JS.
> However it plays out, we should take a look at those pages and update 
> according to the current state of the client universe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023885#comment-17023885
 ] 

ASF subversion and git services commented on LUCENE-9174:
-

Commit 6f85ec04602aa083b4512667d37e36e0213b5c35 in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f85ec0 ]

LUCENE-9174: Bump default gradle memory to 2g


> Bump default gradle memory to 2g
> 
>
> Key: LUCENE-9174
> URL: https://issues.apache.org/jira/browse/LUCENE-9174
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>
> I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't 
> know why it needs to much...
> {code}
> Expiring Daemon because JVM heap space is exhausted
> Daemon will be stopped at the end of the build after running out of JVM memory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13749) Implement support for joining across collections with multiple shards ( XCJF )

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13749?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023879#comment-17023879
 ] 

ASF subversion and git services commented on SOLR-13749:


Commit 127ce3e360ad88cb0a77a58d81eb09df00c04045 in lucene-solr's branch 
refs/heads/gradle-master from Gus Heck
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=127ce3e ]

SOLR-13749 adjust changes to reflect backport to 8.5


> Implement support for joining across collections with multiple shards ( XCJF )
> --
>
> Key: SOLR-13749
> URL: https://issues.apache.org/jira/browse/SOLR-13749
> Project: Solr
>  Issue Type: New Feature
>Reporter: Kevin Watters
>Assignee: Gus Heck
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> This ticket includes 2 query parsers.
> The first one is the "Cross collection join filter"  (XCJF) parser. This is 
> the "Cross-collection join filter" query parser. It can do a call out to a 
> remote collection to get a set of join keys to be used as a filter against 
> the local collection.
> The second one is the Hash Range query parser that you can specify a field 
> name and a hash range, the result is that only the documents that would have 
> hashed to that range will be returned.
> This query parser will do an intersection based on join keys between 2 
> collections.
> The local collection is the collection that you are searching against.
> The remote collection is the collection that contains the join keys that you 
> want to use as a filter.
> Each shard participating in the distributed request will execute a query 
> against the remote collection.  If the local collection is setup with the 
> compositeId router to be routed on the join key field, a hash range query is 
> applied to the remote collection query to only match the documents that 
> contain a potential match for the documents that are in the local shard/core. 
>  
>  
> Here's some vocab to help with the descriptions of the various parameters.
> ||Term||Description||
> |Local Collection|This is the main collection that is being queried.|
> |Remote Collection|This is the collection that the XCJFQuery will query to 
> resolve the join keys.|
> |XCJFQuery|The lucene query that executes a search to get back a set of join 
> keys from a remote collection|
> |HashRangeQuery|The lucene query that matches only the documents whose hash 
> code on a field falls within a specified range.|
>  
>  
> ||Param ||Required ||Description||
> |collection|Required|The name of the external Solr collection to be queried 
> to retrieve the set of join key values ( required )|
> |zkHost|Optional|The connection string to be used to connect to Zookeeper.  
> zkHost and solrUrl are both optional parameters, and at most one of them 
> should be specified.  
> If neither of zkHost or solrUrl are specified, the local Zookeeper cluster 
> will be used. ( optional )|
> |solrUrl|Optional|The URL of the external Solr node to be queried ( optional 
> )|
> |from|Required|The join key field name in the external collection ( required 
> )|
> |to|Required|The join key field name in the local collection|
> |v|See Note|The query to be executed against the external Solr collection to 
> retrieve the set of join key values.  
> Note:  The original query can be passed at the end of the string or as the 
> "v" parameter.  
> It's recommended to use query parameter substitution with the "v" parameter 
> to ensure no issues arise with the default query parsers.|
> |routed| |true / false.  If true, the XCJF query will use each shard's hash 
> range to determine the set of join keys to retrieve for that shard.
> This parameter improves the performance of the cross-collection join, but 
> it depends on the local collection being routed by the toField.  If this 
> parameter is not specified, 
> the XCJF query will try to determine the correct value automatically.|
> |ttl| |The length of time that an XCJF query in the cache will be considered 
> valid, in seconds.  Defaults to 3600 (one hour).  
> The XCJF query will not be aware of changes to the remote collection, so 
> if the remote collection is updated, cached XCJF queries may give inaccurate 
> results.  
> After the ttl period has expired, the XCJF query will re-execute the join 
> against the remote collection.|
> |_All others_| |Any normal Solr parameter can also be specified as a local 
> param.|
>  
> Example Solr Config.xml changes:
>  
>  {{<}}{{cache}} {{name}}{{=}}{{"hash_vin"}}
>  {{   }}{{class}}{{=}}{{"solr.LRUCache"}}
>  {{   }}{{size}}{{=}}{{"128"}}
>  {{   }}{{initialSize}}{{=}}{{"0"}}
>  {{   }}{{regenerator}}{{=}}{{"solr.NoOpRegenerator"}}{{/>}}
>   
>  {{<}}{{queryP

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023883#comment-17023883
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit efd0e8f3e89a954fcb870c9fab18cf19bcdbf97e in lucene-solr's branch 
refs/heads/gradle-master from andywebb1975
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=efd0e8f ]

SOLR-14189 switch from String.trim() to StringUtils.isBlank() (#1172)



> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14189) Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14189?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023884#comment-17023884
 ] 

ASF subversion and git services commented on SOLR-14189:


Commit fd49c903b8193aa27c56655915c1bf741135fa18 in lucene-solr's branch 
refs/heads/gradle-master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fd49c90 ]

SOLR-14189: Add changes entry


> Some whitespace characters bypass zero-length test in query parsers leading 
> to 400 Bad Request
> --
>
> Key: SOLR-14189
> URL: https://issues.apache.org/jira/browse/SOLR-14189
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Andy Webb
>Assignee: Uwe Schindler
>Priority: Major
> Fix For: master (9.0), 8.5
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The edismax and some other query parsers treat pure whitespace queries as 
> empty queries, but they use Java's {{String.trim()}} method to normalise 
> queries. That method [only treats characters 0-32 as 
> whitespace|https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#trim--].
>  Other whitespace characters exist - such as {{U+3000 IDEOGRAPHIC SPACE}} - 
> which bypass the test and lead to {{400 Bad Request}} responses - see for 
> example {{/solr/mycollection/select?q=%E3%80%80&defType=edismax}} vs 
> {{/solr/mycollection/select?q=%20&defType=edismax}}. The first fails with the 
> exception:
> {noformat}
> org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "" at 
> line 1, column 0. Was expecting one of:  ... "+" ... "-" ...  
> ... "(" ... "*" ...  ...  ...  ...  ... 
>  ... "[" ... "{" ...  ... "filter(" ...  ... 
>  ...
> {noformat}
> [PR 1172|https://github.com/apache/lucene-solr/pull/1172] updates the dismax, 
> edismax and rerank query parsers to use 
> [StringUtils.isWhitespace()|https://commons.apache.org/proper/commons-lang/apidocs/org/apache/commons/lang3/StringUtils.html#isWhitespace-java.lang.String-]
>  which is aware of all whitespace characters.
> Prior to the change, rerank behaves differently for U+3000 and U+0020 - with 
> the change, both the below give the "mandatory parameter" message:
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80}}
>  - generic 400 Bad Request
> {{q=greetings&rq=\{!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20}}
>  - 400 reporting "reRankQuery parameter is mandatory"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023880#comment-17023880
 ] 

ASF subversion and git services commented on SOLR-11207:


Commit 74a8d6d5acc67e4d5c6eeb640b8de3f820f0774b in lucene-solr's branch 
refs/heads/gradle-master from Jan Høydahl
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=74a8d6d ]

SOLR-11207: Add OWASP dependency checker to gradle build (#1121)

* SOLR-11207: Add OWASP dependency checker to gradle build

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9174) Bump default gradle memory to 2g

2020-01-26 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023895#comment-17023895
 ] 

Robert Muir commented on LUCENE-9174:
-

The problem isn't the heap, the problem is the daemon.
I've been running builds almost constantly for many days now (daemon disabled) 
with 1GB; no issue.

> Bump default gradle memory to 2g
> 
>
> Key: LUCENE-9174
> URL: https://issues.apache.org/jira/browse/LUCENE-9174
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Major
>
> I see these from time to time so I'll bump the daemon's heap to 2 gigs. Don't 
> know why it needs to much...
> {code}
> Expiring Daemon because JVM heap space is exhausted
> Daemon will be stopped at the end of the build after running out of JVM memory
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023900#comment-17023900
 ] 

ASF subversion and git services commented on LUCENE-9134:
-

Commit c226207842e6237305cacf26bc0add6239a82aa3 in lucene-solr's branch 
refs/heads/gradle-master from Dawid Weiss
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c226207 ]

LUCENE-9134: lucene:core:jflexStandardTokenizerImpl


> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 5h 20m
>  Remaining Estimate: 0h
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9175) gradle build leaks tons of gradle-worker-classpath* files in tmpdir

2020-01-26 Thread Robert Muir (Jira)

Robert Muir created LUCENE-9175:
---

 Summary: gradle build leaks tons of gradle-worker-classpath* files 
in tmpdir
 Key: LUCENE-9175
 URL: https://issues.apache.org/jira/browse/LUCENE-9175
 Project: Lucene - Core
  Issue Type: Task
Reporter: Robert Muir


This may be a sign of classloader issues or similar that cause other issues 
like LUCENE-9174?

{noformat}
$ ls /tmp/gradle-worker-classpath* | wc -l
523
{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778
 ] 

Tomoko Uchida edited comment on LUCENE-9123 at 1/26/20 6:47 PM:


When reproducing this issue I noticed that JapaneseTokenizer (mode=search) 
gives positionIncrements=1 for the decompounded token "株式" instead of 0. This 
looks strange to me, is this an expected behaviour? If not, this may affect the 
synonyms handling?

And please ignore my previous comment... I was mistsken about position 
increment.


was (Author: tomoko uchida):
When reproducing this issue I noticed that JapaneseTokenizer (mode=search) 
gives positionIncrements=1 for the decompounded token "株式" instead of 0. This 
looks strange to me, is this an expected behaviour? If not, this may affect the 
synonyms handling?

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Tomoko Uchida (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023778#comment-17023778
 ] 

Tomoko Uchida edited comment on LUCENE-9123 at 1/26/20 6:49 PM:


When reproducing this issue I noticed that JapaneseTokenizer (mode=search) 
gives positionIncrements=1 for the decompounded token "株式" instead of 0. This 
looks strange to me, is this an expected behaviour? If not, this may affect the 
synonyms handling?

please ignore above my comment... I was mistsken about position increment.


was (Author: tomoko uchida):
When reproducing this issue I noticed that JapaneseTokenizer (mode=search) 
gives positionIncrements=1 for the decompounded token "株式" instead of 0. This 
looks strange to me, is this an expected behaviour? If not, this may affect the 
synonyms handling?

And please ignore my previous comment... I was mistsken about position 
increment.

> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13936) Schema/Config endpoints to modify configset with no core/collection

2020-01-26 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023904#comment-17023904
 ] 

Ishan Chattopadhyaya commented on SOLR-13936:
-

[~apoorvprecisely], I've updated SIP document with your text. 
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=139627548. 
Thanks for helping us out with the wording of the SIP.
Can you please update the patch/PR with unit tests so that we can quickly 
review/commit?

> Schema/Config endpoints to modify configset with no core/collection
> ---
>
> Key: SOLR-13936
> URL: https://issues.apache.org/jira/browse/SOLR-13936
> Project: Solr
>  Issue Type: Sub-task
>  Components: config-api
>Reporter: Apoorv Bhawsar
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> All schema/config configurations should work even in cases where a collection 
> is not associated with them
>  This jira will involve
>  1. Refactoring existing handler/manager to work without {{SolrCore}}
>  2. Adding {{/api/cluster}} endpoints to support such modifications
>  Endpoints -
>  * {{/api/cluster/configset/\{name}/schema}}
>  * {{/ap/cluster/configset/\{name}/config}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-01-26 Thread GitBox

dweiss commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to 
Gradle build
URL: https://github.com/apache/lucene-solr/pull/1199#issuecomment-578532163
 
 
   This is great work, Erick and is very much appreciated. I do have a "but" 
though -- it's large and goes through a number of those tasks at once. I'm 
sorry I've been slow in taking in your patches. I can't really find a chunk of 
time large enough to review and correct certain issues in a large patch like 
this one.
   
   I'd really like to have minimalistic build fragments that only deal with one 
thing at a time. It's different from ant (and arguably different from how other 
projects structure gradle builds) but to me it makes reasoning about a 
particular build aspect simpler.
   
   Take jflex for example as it is really self-contained. You need to have 
access to jflex at a given version (no need to download anything -- you just 
declare a configuration and a dependency), you need a top-level task (so that 
it shows up in help) and you need to configure tasks that are attached to it in 
each project where we generate stuff from jflex files.
   
   I just committed an example that regenerates StandardTokenizerImpl in 
lucene/core - please take a look at the sources and see if it matches what I 
tried to express above. When you run "gradlew jflex" it'll recreate 
StandardTokenizerImpl.java... in fact when you run git diff you won't even see 
the difference because the regenerated file is identical to what it was before 
(which I think should be an ideal goal for now because we don't want to 
generate stuff other than ant does).
   
   The remaining jflex regeneration targets can be appended to this file, 
making it a clean, single-objective concern.
   
   When or if at some point somebody decides that a different way to deal with 
jflex files is more attractive (for example use an external plugin or move the 
custom task to buildSrc) those changes remain pretty much local to this file.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9166) gradle build: test failures need stacktraces

2020-01-26 Thread Robert Muir (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-9166:

Attachment: LUCENE-9166.patch

> gradle build: test failures need stacktraces
> 
>
> Key: LUCENE-9166
> URL: https://issues.apache.org/jira/browse/LUCENE-9166
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9166.patch
>
>
> Test failures are missing the stacktrace. Worse yet, it tells you go to look 
> at a separate (very long) filename which also has no stacktrace :(
> I know gradle tries really hard to be quiet and not say anything, but when a 
> test fails, that isn't the time or place :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9166) gradle build: test failures need stacktraces

2020-01-26 Thread Robert Muir (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023906#comment-17023906
 ] 

Robert Muir commented on LUCENE-9166:
-

Attached is a fix. Gradle has an inappropriate "stack trace filter" by default 
that is removing all of the stacktrace (especially if you hit exc say from a 
base test class, such as LuceneTestCase)

before:
{noformat}
org.apache.lucene.TestDemo > classMethod FAILED
java.lang.IllegalArgumentException: An SPI class of type 
org.apache.lucene.codecs.Codec with name 'BOGUS' does not exist.  You need to 
add the corresponding JAR file supporting this SPI to your classpath.  The 
current classpath supports the following names: [Lucene84, Asserting, 
CheapBastard, FastCompressingStoredFields, 
FastDecompressionCompressingStoredFields, 
HighCompressionCompressingStoredFields, DummyCompressingStoredFields, 
SimpleText]
{noformat}

after:
{noformat}
org.apache.lucene.TestDemo > classMethod FAILED
java.lang.IllegalArgumentException: An SPI class of type 
org.apache.lucene.codecs.Codec with name 'BOGUS' does not exist.  You need to 
add the corresponding JAR file supporting this SPI to your classpath.  The 
current classpath supports the following names: [Lucene84, Asserting, 
CheapBastard, FastCompressingStoredFields, 
FastDecompressionCompressingStoredFields, 
HighCompressionCompressingStoredFields, DummyCompressingStoredFields, 
SimpleText]
at __randomizedtesting.SeedInfo.seed([F03E3EEA39CA3E35]:0)
at org.apache.lucene.util.NamedSPILoader.lookup(NamedSPILoader.java:116)
at org.apache.lucene.codecs.Codec.forName(Codec.java:116)
at 
org.apache.lucene.util.TestRuleSetupAndRestoreClassEnv.before(TestRuleSetupAndRestoreClassEnv.java:195)
at 
org.apache.lucene.util.AbstractBeforeAfterRule$1.evaluate(AbstractBeforeAfterRule.java:44)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleStoreClassName$1.evaluate(TestRuleStoreClassName.java:41)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:40)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:53)
at 
org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:47)
at 
org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:64)
at 
org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:54)
at 
com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl$StatementRunner.run(ThreadLeakControl.java:370)
at 
com.carrotsearch.randomizedtesting.ThreadLeakControl.lambda$forkTimeoutingTask$0(ThreadLeakControl.java:826)
at java.base/java.lang.Thread.run(Thread.java:830)
{noformat}


> gradle build: test failures need stacktraces
> 
>
> Key: LUCENE-9166
> URL: https://issues.apache.org/jira/browse/LUCENE-9166
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Robert Muir
>Priority: Major
> Attachments: LUCENE-9166.patch
>
>
> Test failures are missing the stacktrace. Worse yet, it tells you go to look 
> at a separate (very long) filename which also has no stacktrace :(
> I know gradle tries really hard to be quiet and not say anything, but when a 
> test fails, that isn't the time or place :)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-14220) Unable to build 7_7 or 8_4 due to missing dependency

2020-01-26 Thread Karl Stoney (Jira)

Karl Stoney created SOLR-14220:
--

 Summary: Unable to build 7_7 or 8_4 due to missing dependency
 Key: SOLR-14220
 URL: https://issues.apache.org/jira/browse/SOLR-14220
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: Build
Affects Versions: 8.4, 7.7
Reporter: Karl Stoney


Attempting to build from:
7_7:
https://github.com/apache/lucene-solr/commit/7a309c21ebbc1b08d9edf67802b63fc0bc7affcf

or

8_4:
https://github.com/apache/lucene-solr/commit/7d3ac7c284b26ce62f41d3b8686f70c7d6bd758d

Results in the same build failure:


{code:java}
BUILD FAILED
/usr/local/autotrader/app/lucene-solr/solr/build.xml:685: The following error 
occurred while executing this line:
/usr/local/autotrader/app/lucene-solr/solr/build.xml:656: The following error 
occurred while executing this line:
/usr/local/autotrader/app/lucene-solr/lucene/common-build.xml:653: Error 
downloading wagon provider from the remote repository: Missing:
--
1) org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7

  Try downloading the file manually from the project website.

  Then, install it using the command: 
  mvn install:install-file -DgroupId=org.apache.maven.wagon 
-DartifactId=wagon-ssh -Dversion=1.0-beta-7 -Dpackaging=jar -Dfile=/path/to/file

  Alternatively, if you host your own repository you can deploy the file there: 
  mvn deploy:deploy-file -DgroupId=org.apache.maven.wagon 
-DartifactId=wagon-ssh -Dversion=1.0-beta-7 -Dpackaging=jar 
-Dfile=/path/to/file -Durl=[url] -DrepositoryId=[id]

  Path to dependency: 
1) unspecified:unspecified:jar:0.0
2) org.apache.maven.wagon:wagon-ssh:jar:1.0-beta-7

--
1 required artifact is missing.

for artifact: 
  unspecified:unspecified:jar:0.0

from the specified remote repositories:
  central (http://repo1.maven.org/maven2)

{code}


Previously building 7_7 from 3aad3311a97256a8537dd04165c67edcce1c153c, and 8_4 
from c0b96fd305946b2564b967272e6e23c59ab0b5da worked fine.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev commented on SOLR-12325:
-

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{{}} or {{$}}, we can recognize query case in: 
{{uniqueBlock(\{!v=type_s:parent})}} and {{uniqueBlock(\{!v=$type_param})}}. 
But it's not possible to distinguish {{uniqueBlock($field_or_q_param)}} nor 
handle {{uniqueBlock(\{! v=type_s:parent}). }}How everyone thinks about it?{{ 
}} 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:50 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{{!}} or {{$}}, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{{}} or {{$}}, we can recognize query case in: 
{{uniqueBlock(\{!v=type_s:parent})}} and {{uniqueBlock(\{!v=$type_param})}}. 
But it's not possible to distinguish {{uniqueBlock($field_or_q_param)}} nor 
handle {{uniqueBlock(\{! v=type_s:parent}). }}How everyone thinks about it?{{ 
}} 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:50 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{\{!}} or {{$}}, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{{!}} or {{$}}, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:51 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from curly brace or bucks, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from \{! or \$, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:51 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from \{! or \$, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from {{\{!}} or {{$}}, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:52 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from curly brace or bucks, we can recognize query case in: 
uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{\!v=$type_param\}). But 
it's not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{\! v=type_s:parent\}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from curly brace or bucks, we can recognize query case in: 
uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-12325) introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet

2020-01-26 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-12325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023923#comment-17023923
 ] 

Mikhail Khludnev edited comment on SOLR-12325 at 1/26/20 8:52 PM:
--

[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from curly brace or bucks, we can recognize query case in: 
uniqueBlock(\{\!v=type_s:parent\}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 


was (Author: mkhludnev):
[~munendrasn] thanks for your feedback. I think it's worth to have tests 
covering absent fields, values, no matches and so one. 

I don't share concern about introducing new aggregation by suffixing existing 
one. I feel like proposal with introducing first arg enum is inconvenient for 
users, for example {{query}} enum mimics to {{json}}.{{query, }}I'm afraid it 
will just confuse users.

However, giving that {{FunctionQParser.parseNestedQuery()}} requires argument 
start from curly brace or bucks, we can recognize query case in: 
uniqueBlock(\{!v=type_s:parent}) and uniqueBlock(\{!v=$type_param}). But it's 
not possible to distinguish uniqueBlock($field_or_q_param) nor handle 
uniqueBlock(\{! v=type_s:parent}). How everyone thinks about it? 

> introduce uniqueBlockQuery(parent:true) aggregation for JSON Facet
> --
>
> Key: SOLR-12325
> URL: https://issues.apache.org/jira/browse/SOLR-12325
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.5
>
> Attachments: SOLR-12325.patch, SOLR-12325.patch, SOLR-12325.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> It might be faster twin for {{uniqueBlock(\_root_)}}. Please utilise buildin 
> query parsing method, don't invent your own. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14202) Old segments are not deleted after commit

2020-01-26 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023930#comment-17023930
 ] 

Erick Erickson commented on SOLR-14202:
---

It ought not be hard to alter the attached program to do two things:

1> index into whatever fields you really use, you can see that the ones I used 
are pretty generic.

2> use whatever component you think is valid that you're using. You mentioned 
suggester for instance.

I believe Lucene checks on startup to see if there are segments that the 
segments_gen file does _not_ point to and deletes them, this is consistent with 
a searcher not being closed and with the files disappearing on restart rather 
than shut down.

If you can share your conf directory maybe I'll have some time to run a test in 
parallel.

> Old segments are not deleted after commit
> -
>
> Key: SOLR-14202
> URL: https://issues.apache.org/jira/browse/SOLR-14202
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrCloud
>Affects Versions: 8.4
>Reporter: Jörn Franke
>Priority: Major
> Attachments: eoe.zip
>
>
> The data directory of a collection is growing and growing. It seems that old 
> segments are not deleted. They are only deleting during start of Solr.
> How to reproduce. Have any collection (e.g. the example collection) and start 
> indexing documents. Even during the indexing the data directory is growing 
> significantly - much more than expected (several magnitudes). if certain 
> documents are updated (without significantly increasing the amount of data) 
> the index data directory grows again several magnitudes. Even for small 
> collections the needed space explodes.
> This reduces significantly if Solr is stopped and then started. During 
> startup (not shutdown) Solr purges all those segments if not needed (* 
> sometimes some but not a significant amount is deleted during shutdown). This 
> is of course not a good workaround for normal operations.
> It does not seem to have a affect on queries (their performance do not seem 
> to change).
> The configs have not changed before the upgrade and after (e.g. from Solr 8.2 
> to 8.3 to 8.4, not cross major versions), so I assume it could be related to 
> Solr 8.4. It may have been also in Solr 8.3 (not sure), but not in 8.2.
>  
> IndexConfig is pretty much default: Lock type: native, autoCommit: 15000, 
> openSearcher=false, autoSoftCommit -1 (reproducible with autoCommit 5000).
> Nevertheless, it did not happen in previous versions of Solr and the config 
> did not change.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023939#comment-17023939
 ] 

Jan Høydahl commented on SOLR-11207:


Thanks for the cleanup, a separate task 'owasp' and using the property to 
attach it to check makes sense! Closing this.

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-11207) Add OWASP dependency checker to detect security vulnerabilities in third party libraries

2020-01-26 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SOLR-11207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jan Høydahl resolved SOLR-11207.

Fix Version/s: 8.5
   Resolution: Fixed

> Add OWASP dependency checker to detect security vulnerabilities in third 
> party libraries
> 
>
> Key: SOLR-11207
> URL: https://issues.apache.org/jira/browse/SOLR-11207
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 6.0
>Reporter: Hrishikesh Gadre
>Assignee: Jan Høydahl
>Priority: Major
> Fix For: 8.5
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> Lucene/Solr project depends on number of third party libraries. Some of those 
> libraries contain security vulnerabilities. Upgrading to versions of those 
> libraries that have fixes for those vulnerabilities is a simple, critical 
> step we can take to improve the security of the system. But for that we need 
> a tool which can scan the Lucene/Solr dependencies and look up the security 
> database for known vulnerabilities.
> I found that [OWASP 
> dependency-checker|https://jeremylong.github.io/DependencyCheck/dependency-check-ant/]
>  can be used for this purpose. It provides a ant task which we can include in 
> the Lucene/Solr build. We also need to figure out how (and when) to invoke 
> this dependency-checker. But this can be figured out once we complete the 
> first step of integrating this tool with the Lucene/Solr build system.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ErickErickson commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-01-26 Thread GitBox

ErickErickson commented on issue #1199: LUCENE-9134: Port ant-regenerate tasks 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1199#issuecomment-578548576
 
 
   Starting over with a new model


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023947#comment-17023947
 ] 

Erick Erickson commented on LUCENE-9134:


OK, thanks Dawid. I suspected some/all of what I've done so far would be 
throw-away while I got my feet wet with "the gradle way". Or maybe that's 
"Dawid's way" ;) And, for that matter, understood what the heck the ant stuff 
was doing.

Humor aside, it's great that you're willing to lend some structure to the 
gradle effort, that helps keep things coherent rather than ad-hoc, with many 
different structures depending on who did which bit.

 I'll close the PR and start over with your model, now that I have an approach 
I'm _starting_ to see how they all fit together, and I can do these in smaller 
chunks.

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 5.5h
>  Remaining Estimate: 0h
>
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for --
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] ErickErickson closed pull request #1199: LUCENE-9134: Port ant-regenerate tasks to Gradle build

2020-01-26 Thread GitBox

ErickErickson closed pull request #1199: LUCENE-9134: Port ant-regenerate tasks 
to Gradle build
URL: https://github.com/apache/lucene-solr/pull/1199
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov commented on issue #564: prorated early termination

2020-01-26 Thread GitBox

msokolov commented on issue #564: prorated early termination
URL: https://github.com/apache/lucene-solr/pull/564#issuecomment-578549784
 
 
   Abandoning as I plan to post a better alternative that achieves the same 
result without the random behavior.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov closed pull request #564: prorated early termination

2020-01-26 Thread GitBox

msokolov closed pull request #564: prorated early termination
URL: https://github.com/apache/lucene-solr/pull/564
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-9134:
---
Description: 
Take II about organizing this beast.
 A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. See process comments at 
parent (LUCENE-9077)


* Implement jflex task in lucene/core
* Implement jflex tasks in lucene/analysis
* Implement javacc tasks in lucene/queryparser
* Implement javacc tasks in solr/core
* Implement python tasks in lucene (? there are several javadocs mentions in 
the build.xml, this may be irrelevant to the Gradle effort).
* Implement python tasks in lucene/core
* Implement python tasks in lucene/analysis
* 
Here are the "regenerate" targets I found in the ant version. There are a 
couple that I don't have evidence for or against being rebuilt

 // Very top level
{code:java}
./build.xml: 
./build.xml: 
./build.xml: 
 {code}
// top level Lucene. This includes the core/build.xml and 
test-framework/build.xml files
{code:java}
./lucene/build.xml: 
./lucene/build.xml: 
./lucene/build.xml: 
 {code}
// This one has quite a number of customizations to
{code:java}
./lucene/core/build.xml: 
 {code}
// This one has a bunch of code modifications _after_ javacc is run on certain 
of the
 // output files. Save this one for last?
{code:java}
./lucene/queryparser/build.xml: 
 {code}
// the files under ../lucene/analysis... are pretty self contained. I expect 
these could be done as a unit
{code:java}
./lucene/analysis/build.xml: 
./lucene/analysis/build.xml: 

./lucene/analysis/common/build.xml: 
./lucene/analysis/icu/build.xml: 
./lucene/analysis/kuromoji/build.xml: 
./lucene/analysis/nori/build.xml: 
./lucene/analysis/opennlp/build.xml: 
 {code}
 

// These _are_ regenerated from the top-level regenerate target, but for –

LUCENE-9080//the changes were only in imports so there are no

//corresponding files checked in in that JIRA
{code:java}
./lucene/expressions/build.xml: 
 {code}
// Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
"train-test-models" target

// Apparently not rebuilt from the top level, but _are_ regenerated when 
executed from

// ./solr/contrib/langid
{code:java}
./solr/contrib/langid/build.xml: 
 {code}
 

  was:
Here are the "regenerate" targets I found in the ant version. There are a 
couple that I don't have evidence for or against being rebuilt

 // Very top level
{code:java}
./build.xml: 
./build.xml: 
./build.xml: 
 {code}
// top level Lucene. This includes the core/build.xml and 
test-framework/build.xml files
{code:java}
./lucene/build.xml: 
./lucene/build.xml: 
./lucene/build.xml: 
 {code}
// This one has quite a number of customizations to
{code:java}
./lucene/core/build.xml: 
 {code}
// This one has a bunch of code modifications _after_ javacc is run on certain 
of the
 // output files. Save this one for last?
{code:java}
./lucene/queryparser/build.xml: 
 {code}
// the files under ../lucene/analysis... are pretty self contained. I expect 
these could be done as a unit
{code:java}
./lucene/analysis/build.xml: 
./lucene/analysis/build.xml: 

./lucene/analysis/common/build.xml: 
./lucene/analysis/icu/build.xml: 
./lucene/analysis/kuromoji/build.xml: 
./lucene/analysis/nori/build.xml: 
./lucene/analysis/opennlp/build.xml: 
 {code}
 

// These _are_ regenerated from the top-level regenerate target, but for --

LUCENE-9080//the changes were only in imports so there are no

//corresponding files checked in in that JIRA
{code:java}
./lucene/expressions/build.xml: 
 {code}
// Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
"train-test-models" target

// Apparently not rebuilt from the top level, but _are_ regenerated when 
executed from

// ./solr/contrib/langid
{code:java}
./solr/contrib/langid/build.xml: 
 {code}
 


> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Take II about organizing this beast.
>  A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. See process comments 
> at parent (LUCENE-9077)
> * Implement jflex task in lucene/core
> * Implement jflex tasks in lucene/analysis
> * Implement javacc tasks in lucene/queryparser
> * Implement javacc tasks in solr/core
> * Implement python tasks in lucene (? there are several javadocs mentions in 
> the build.xml, this may be irrelevant to the Gradle effort).
> * Implement python tasks in lu

[jira] [Updated] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated LUCENE-9134:
---
Description: 
Take II about organizing this beast.
 A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. See process comments at 
parent (LUCENE-9077)
 * Implement jflex task in lucene/core
 * Implement jflex tasks in lucene/analysis
 * Implement javacc tasks in lucene/queryparser (EOE)
 * Implement javacc tasks in solr/core (EOE)
 * Implement python tasks in lucene (? there are several javadocs mentions in 
the build.xml, this may be irrelevant to the Gradle effort).
 * Implement python tasks in lucene/core
 * Implement python tasks in lucene/analysis

 

Here are the "regenerate" targets I found in the ant version. There are a 
couple that I don't have evidence for or against being rebuilt

 // Very top level
{code:java}
./build.xml: 
./build.xml: 
./build.xml: 
 {code}
// top level Lucene. This includes the core/build.xml and 
test-framework/build.xml files
{code:java}
./lucene/build.xml: 
./lucene/build.xml: 
./lucene/build.xml: 
 {code}
// This one has quite a number of customizations to
{code:java}
./lucene/core/build.xml: 
 {code}
// This one has a bunch of code modifications _after_ javacc is run on certain 
of the
 // output files. Save this one for last?
{code:java}
./lucene/queryparser/build.xml: 
 {code}
// the files under ../lucene/analysis... are pretty self contained. I expect 
these could be done as a unit
{code:java}
./lucene/analysis/build.xml: 
./lucene/analysis/build.xml: 

./lucene/analysis/common/build.xml: 
./lucene/analysis/icu/build.xml: 
./lucene/analysis/kuromoji/build.xml: 
./lucene/analysis/nori/build.xml: 
./lucene/analysis/opennlp/build.xml: 
 {code}
 

// These _are_ regenerated from the top-level regenerate target, but for –

LUCENE-9080//the changes were only in imports so there are no

//corresponding files checked in in that JIRA
{code:java}
./lucene/expressions/build.xml: 
 {code}
// Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
"train-test-models" target

// Apparently not rebuilt from the top level, but _are_ regenerated when 
executed from

// ./solr/contrib/langid
{code:java}
./solr/contrib/langid/build.xml: 
 {code}
 

  was:
Take II about organizing this beast.
 A list of items that needs to be added or requires work. If you'd like to work 
on any of these, please add your name to the list. See process comments at 
parent (LUCENE-9077)


* Implement jflex task in lucene/core
* Implement jflex tasks in lucene/analysis
* Implement javacc tasks in lucene/queryparser
* Implement javacc tasks in solr/core
* Implement python tasks in lucene (? there are several javadocs mentions in 
the build.xml, this may be irrelevant to the Gradle effort).
* Implement python tasks in lucene/core
* Implement python tasks in lucene/analysis
* 
Here are the "regenerate" targets I found in the ant version. There are a 
couple that I don't have evidence for or against being rebuilt

 // Very top level
{code:java}
./build.xml: 
./build.xml: 
./build.xml: 
 {code}
// top level Lucene. This includes the core/build.xml and 
test-framework/build.xml files
{code:java}
./lucene/build.xml: 
./lucene/build.xml: 
./lucene/build.xml: 
 {code}
// This one has quite a number of customizations to
{code:java}
./lucene/core/build.xml: 
 {code}
// This one has a bunch of code modifications _after_ javacc is run on certain 
of the
 // output files. Save this one for last?
{code:java}
./lucene/queryparser/build.xml: 
 {code}
// the files under ../lucene/analysis... are pretty self contained. I expect 
these could be done as a unit
{code:java}
./lucene/analysis/build.xml: 
./lucene/analysis/build.xml: 

./lucene/analysis/common/build.xml: 
./lucene/analysis/icu/build.xml: 
./lucene/analysis/kuromoji/build.xml: 
./lucene/analysis/nori/build.xml: 
./lucene/analysis/opennlp/build.xml: 
 {code}
 

// These _are_ regenerated from the top-level regenerate target, but for –

LUCENE-9080//the changes were only in imports so there are no

//corresponding files checked in in that JIRA
{code:java}
./lucene/expressions/build.xml: 
 {code}
// Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
"train-test-models" target

// Apparently not rebuilt from the top level, but _are_ regenerated when 
executed from

// ./solr/contrib/langid
{code:java}
./solr/contrib/langid/build.xml: 
 {code}
 


> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>

[jira] [Comment Edited] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17023947#comment-17023947
 ] 

Erick Erickson edited comment on LUCENE-9134 at 1/26/20 10:59 PM:
--

OK, thanks Dawid. I suspected some/all of what I've done so far would be 
throw-away while I got my feet wet with "the gradle way". Or maybe that's 
"Dawid's way" ;) And, for that matter, understood what the heck the ant stuff 
was doing.

Humor aside, it's great that you're willing to lend some structure to the 
gradle effort, that helps keep things coherent rather than ad-hoc, with many 
different ways of doing something depending on who did which bit.

I'll close the PR and start over with your model, now that I have an approach 
I'm _starting_ to see how they all fit together, and I can do these in smaller 
chunks.

Should I put the deletes bits in? If so, my impulse would be to put it in the 
@TaskAction in JFlexTask.

Regardless of whether I should, would that be the correct place for something 
like that?

So is there anything to do with the jflex you put in for StandardTokenizerImpl 
except push it except verification or perhaps put the delete parts back? I'll 
look at javacc in the meantime.


was (Author: erickerickson):
OK, thanks Dawid. I suspected some/all of what I've done so far would be 
throw-away while I got my feet wet with "the gradle way". Or maybe that's 
"Dawid's way" ;) And, for that matter, understood what the heck the ant stuff 
was doing.

Humor aside, it's great that you're willing to lend some structure to the 
gradle effort, that helps keep things coherent rather than ad-hoc, with many 
different structures depending on who did which bit.

 I'll close the PR and start over with your model, now that I have an approach 
I'm _starting_ to see how they all fit together, and I can do these in smaller 
chunks.

> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Take II about organizing this beast.
>  A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. See process comments 
> at parent (LUCENE-9077)
>  * Implement jflex task in lucene/core
>  * Implement jflex tasks in lucene/analysis
>  * Implement javacc tasks in lucene/queryparser (EOE)
>  * Implement javacc tasks in solr/core (EOE)
>  * Implement python tasks in lucene (? there are several javadocs mentions in 
> the build.xml, this may be irrelevant to the Gradle effort).
>  * Implement python tasks in lucene/core
>  * Implement python tasks in lucene/analysis
>  
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for –
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
>

[jira] [Created] (SOLR-14221) Upgrade restlet

2020-01-26 Thread Jira

Jan Høydahl created SOLR-14221:
--

 Summary: Upgrade restlet
 Key: SOLR-14221
 URL: https://issues.apache.org/jira/browse/SOLR-14221
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Jan Høydahl


Upgrade restlet to latest version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy opened a new pull request #1211: SOLR-14221: Upgrade restlet to version 2.4.0

2020-01-26 Thread GitBox

janhoy opened a new pull request #1211: SOLR-14221: Upgrade restlet to version 
2.4.0
URL: https://github.com/apache/lucene-solr/pull/1211
 
 
   See https://issues.apache.org/jira/browse/SOLR-14221


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9134) Port ant-regenerate tasks to Gradle build

2020-01-26 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024057#comment-17024057
 ] 

Erick Erickson commented on LUCENE-9134:


While I'm looking at the javacc task, a looming question for a later task:

lucene/util/automaton/createLevAutomata.py wants: "moman/finenight/python". We 
were getting it from: 
"https://bitbucket.org/jpbarrette/moman/get/5c5c2a1e4dea.zip";. What's the 
theory on how to have Gradle deal with it?


> Port ant-regenerate tasks to Gradle build
> -
>
> Key: LUCENE-9134
> URL: https://issues.apache.org/jira/browse/LUCENE-9134
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Erick Erickson
>Assignee: Erick Erickson
>Priority: Major
> Attachments: LUCENE-9134.patch, core_regen.patch
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> Take II about organizing this beast.
>  A list of items that needs to be added or requires work. If you'd like to 
> work on any of these, please add your name to the list. See process comments 
> at parent (LUCENE-9077)
>  * Implement jflex task in lucene/core
>  * Implement jflex tasks in lucene/analysis
>  * Implement javacc tasks in lucene/queryparser (EOE)
>  * Implement javacc tasks in solr/core (EOE)
>  * Implement python tasks in lucene (? there are several javadocs mentions in 
> the build.xml, this may be irrelevant to the Gradle effort).
>  * Implement python tasks in lucene/core
>  * Implement python tasks in lucene/analysis
>  
> Here are the "regenerate" targets I found in the ant version. There are a 
> couple that I don't have evidence for or against being rebuilt
>  // Very top level
> {code:java}
> ./build.xml: 
> ./build.xml:  failonerror="true">
> ./build.xml:  depends="regenerate,-check-after-regeneration"/>
>  {code}
> // top level Lucene. This includes the core/build.xml and 
> test-framework/build.xml files
> {code:java}
> ./lucene/build.xml: 
> ./lucene/build.xml:  inheritall="false">
> ./lucene/build.xml: 
>  {code}
> // This one has quite a number of customizations to
> {code:java}
> ./lucene/core/build.xml:  depends="createLevAutomata,createPackedIntSources,jflex"/>
>  {code}
> // This one has a bunch of code modifications _after_ javacc is run on 
> certain of the
>  // output files. Save this one for last?
> {code:java}
> ./lucene/queryparser/build.xml: 
>  {code}
> // the files under ../lucene/analysis... are pretty self contained. I expect 
> these could be done as a unit
> {code:java}
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/build.xml: 
> ./lucene/analysis/common/build.xml:  depends="jflex,unicode-data"/>
> ./lucene/analysis/icu/build.xml:  depends="gen-utr30-data-files,gennorm2,genrbbi"/>
> ./lucene/analysis/kuromoji/build.xml:  depends="build-dict"/>
> ./lucene/analysis/nori/build.xml:  depends="build-dict"/>
> ./lucene/analysis/opennlp/build.xml:  depends="train-test-models"/>
>  {code}
>  
> // These _are_ regenerated from the top-level regenerate target, but for –
> LUCENE-9080//the changes were only in imports so there are no
> //corresponding files checked in in that JIRA
> {code:java}
> ./lucene/expressions/build.xml:  depends="run-antlr"/>
>  {code}
> // Apparently unrelated to ./lucene/analysis/opennlp/build.xml 
> "train-test-models" target
> // Apparently not rebuilt from the top level, but _are_ regenerated when 
> executed from
> // ./solr/contrib/langid
> {code:java}
> ./solr/contrib/langid/build.xml:  depends="train-test-models"/>
>  {code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9123) JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter

2020-01-26 Thread Kazuaki Hiraga (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024058#comment-17024058
 ] 

Kazuaki Hiraga commented on LUCENE-9123:


[~romseygeek] Thank you for your comments. I think we can modify the output of 
Kuromoji to deal with the issue at this moment, which the current 
GraphTokenStream cannot deal with decompounded tokens since we don't think we 
need to keep original tokens along with decompounded ones for many situations. 
So, we can introduce a new option to absorb originals for now.  However, we 
think either SynonymGraphStream or ToeknStream should be able to deal with 
complex cases like you have mentioned in the future release of Lucene. 

[~tomoko], Thank you for your hard work! Please let me know if you have 
anything what I can help your testing or updating.  And thank you for creating 
a ticket that points out the issue in SynonymGraphFilter: LUCENE-9173. 


> JapaneseTokenizer with search mode doesn't work with SynonymGraphFilter
> ---
>
> Key: LUCENE-9123
> URL: https://issues.apache.org/jira/browse/LUCENE-9123
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/analysis
>Affects Versions: 8.4
>Reporter: Kazuaki Hiraga
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: LUCENE-9123.patch, LUCENE-9123_8x.patch
>
>
> JapaneseTokenizer with `mode=search` or `mode=extended` doesn't work with 
> both of SynonymGraphFilter and SynonymFilter when JT generates multiple 
> tokens as an output. If we use `mode=normal`, it should be fine. However, we 
> would like to use decomposed tokens that can maximize to chance to increase 
> recall.
> Snippet of schema:
> {code:xml}
>  positionIncrementGap="100" autoGeneratePhraseQueries="false">
>   
> 
>  synonyms="lang/synonyms_ja.txt"
> tokenizerFactory="solr.JapaneseTokenizerFactory"/>
> 
> 
>  tags="lang/stoptags_ja.txt" />
> 
> 
> 
> 
> 
>  minimumLength="4"/>
> 
> 
>   
> 
> {code}
> An synonym entry that generates error:
> {noformat}
> 株式会社,コーポレーション
> {noformat}
> The following is an output on console:
> {noformat}
> $ ./bin/solr create_core -c jp_test -d ../config/solrconfs
> ERROR: Error CREATEing SolrCore 'jp_test': Unable to create core [jp_test3] 
> Caused by: term: 株式会社 analyzed to a token (株式会社) with position increment != 1 
> (got: 0)
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14095) Replace Java serialization with Javabin in Overseer operations

2020-01-26 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024064#comment-17024064
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14095:
--

Thanks for testing this Andy! I'll take a look at this tomorrow. 
Can you include the steps you did to reproduce this? Are you upgrading from 
Solr 8.4? or some older version?

> Replace Java serialization with Javabin in Overseer operations
> --
>
> Key: SOLR-14095
> URL: https://issues.apache.org/jira/browse/SOLR-14095
> Project: Solr
>  Issue Type: Task
>Reporter: Robert Muir
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-14095-json.patch, json-nl.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Removing the use of serialization is greatly preferred.
> But if serialization over the wire must really happen, then we must use JDK's 
> serialization filtering capability to prevent havoc.
> https://docs.oracle.com/javase/10/core/serialization-filtering1.htm#JSCOR-GUID-3ECB288D-E5BD-4412-892F-E9BB11D4C98A



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader

2020-01-26 Thread GitBox

dsmiley commented on issue #1191: SOLR-14197 Reduce API of SolrResourceLoader
URL: https://github.com/apache/lucene-solr/pull/1191#issuecomment-578586761
 
 
   I rebased off master since there were some upstream changes.  Also resolved 
some getInstancePath callers though didn't actually remove it yet.  I think it 
can move (not remain in SRL) to ZkSolrResourceLoader (until 9x, then remove) 
and a new StandaloneSolrResourceLoader.  I think a StandaloneSolrResourceLoader 
is the next step at this point.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024107#comment-17024107
 ] 

ASF subversion and git services commented on SOLR-13897:


Commit 776631254ffa900527fa1ed7bcf789265cb289c1 in lucene-solr's branch 
refs/heads/master from Shalin Shekhar Mangar
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7766312 ]

SOLR-13897: Fix unsafe publication of Terms object in ZkShardTerms that can 
cause visibility issues and race conditions under contention


> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-26 Thread Shalin Shekhar Mangar (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-13897:
-
Status: Open  (was: Patch Available)

> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-26 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17024108#comment-17024108
 ] 

ASF subversion and git services commented on SOLR-13897:


Commit 7316391d2dd77c486fa25b8435f0bcde33837a6d in lucene-solr's branch 
refs/heads/branch_8x from Shalin Shekhar Mangar
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7316391 ]

SOLR-13897: Fix unsafe publication of Terms object in ZkShardTerms that can 
cause visibility issues and race conditions under contention

(cherry picked from commit 776631254ffa900527fa1ed7bcf789265cb289c1)


> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0)
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-13897) Unsafe publication of Terms object in ZkShardTerms

2020-01-26 Thread Shalin Shekhar Mangar (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-13897.
--
Fix Version/s: 8.5
   Resolution: Fixed

> Unsafe publication of Terms object in ZkShardTerms
> --
>
> Key: SOLR-13897
> URL: https://issues.apache.org/jira/browse/SOLR-13897
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 8.2, 8.3
>Reporter: Shalin Shekhar Mangar
>Assignee: Shalin Shekhar Mangar
>Priority: Major
> Fix For: master (9.0), 8.5
>
> Attachments: SOLR-13897.patch, SOLR-13897.patch, SOLR-13897.patch, 
> SOLR-13897.patch
>
>
> The Terms object in ZkShardTerms is written using a write lock but reading is 
> allowed freely. This is not safe and can cause visibility issues and 
> associated race conditions under contention.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9176) TestEstimatePointCount failure after changing number of indexed points

2020-01-26 Thread Ignacio Vera (Jira)

Ignacio Vera created LUCENE-9176:


 Summary:  TestEstimatePointCount failure after changing number of 
indexed points
 Key: LUCENE-9176
 URL: https://issues.apache.org/jira/browse/LUCENE-9176
 Project: Lucene - Core
  Issue Type: Test
Reporter: Ignacio Vera


These tests can create now situations when there is only one leaf node. The 
tests do not handle this situation properly.
{code:java}
ant test  -Dtestcase=TestLucene60PointsFormat 
-Dtests.method=testEstimatePointCount -Dtests.seed=A921F5ACFEF2F5B6 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ta-IN 
-Dtests.timezone=Asia/Kuala_Lumpur -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1 {code}
{code:java}
ant test  -Dtestcase=TestLucene60PointsFormat 
-Dtests.method=testEstimatePointCount2Dims -Dtests.seed=99F4A087E8092D56 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=am-ET 
-Dtests.timezone=Asia/Calcutta -Dtests.asserts=true 
-Dtests.file.encoding=US-ASCII {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase opened a new pull request #1212: LUCENE-9176: Handle the case when there is only one leaf node in TestEstimatePointCount

2020-01-26 Thread GitBox

iverase opened a new pull request #1212: LUCENE-9176: Handle the case when 
there is only one leaf node in TestEstimatePointCount
URL: https://github.com/apache/lucene-solr/pull/1212
 
 
   see https://issues.apache.org/jira/browse/LUCENE-9176


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored

2020-01-26 Thread Andrzej Wislowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Wislowski updated SOLR-14194:
-
Attachment: SOLR-14194.patch
Status: Patch Available  (was: Patch Available)

> Allow Highlighting to work for indexes with uniqueKey that is not stored
> 
>
> Key: SOLR-14194
> URL: https://issues.apache.org/jira/browse/SOLR-14194
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: master (9.0)
>Reporter: Andrzej Wislowski
>Assignee: David Smiley
>Priority: Minor
>  Labels: highlighter
> Fix For: master (9.0)
>
> Attachments: SOLR-14194.patch, SOLR-14194.patch
>
>
> Highlighting requires uniqueKey to be a stored field. I have changed 
> Highlighter allow returning results on indexes with uniqueKey that is a not 
> stored field, but saved as a docvalue type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14194) Allow Highlighting to work for indexes with uniqueKey that is not stored

2020-01-26 Thread Andrzej Wislowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Wislowski updated SOLR-14194:
-
Attachment: (was: SOLR-14194.patch)

> Allow Highlighting to work for indexes with uniqueKey that is not stored
> 
>
> Key: SOLR-14194
> URL: https://issues.apache.org/jira/browse/SOLR-14194
> Project: Solr
>  Issue Type: Improvement
>  Components: highlighter
>Affects Versions: master (9.0)
>Reporter: Andrzej Wislowski
>Assignee: David Smiley
>Priority: Minor
>  Labels: highlighter
> Fix For: master (9.0)
>
> Attachments: SOLR-14194.patch, SOLR-14194.patch
>
>
> Highlighting requires uniqueKey to be a stored field. I have changed 
> Highlighter allow returning results on indexes with uniqueKey that is a not 
> stored field, but saved as a docvalue type.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

89 matches

Mail list logo