date:20191114

[jira] [Commented] (LUCENE-9029) Deprecate SloppyMath toRadians/toDegrees in favor of Java Math

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974003#comment-16974003
 ] 

ASF subversion and git services commented on LUCENE-9029:
-

Commit d62a2dd67e6f45a4b50063d095af6b82f64ec46e in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d62a2dd ]

LUCENE-9029: Deprecate SloppyMath toRadians/toDegrees in favor of Java Math


> Deprecate SloppyMath toRadians/toDegrees in favor of Java Math
> --
>
> Key: LUCENE-9029
> URL: https://issues.apache.org/jira/browse/LUCENE-9029
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Priority: Trivial
> Attachments: LUCENE-9029.patch, LUCENE-9029.patch
>
>
> This change follows a TODO left in SloppyMath to remove toRadians/toDegrees 
> since from Java 9 forward Math toRadians/toDegrees is now identical. Since 
> these methods/constants are public, deprecation messages are added to each 
> one. Internally, in Lucene, all instances of the SloppyMath versions are 
> replaced with the standard Java Math versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy commented on issue #908: Change the file format of README files from README.txt to README.md a…

2019-11-14 Thread GitBox

janhoy commented on issue #908: Change the file format of README files from 
README.txt to README.md a…
URL: https://github.com/apache/lucene-solr/pull/908#issuecomment-553777021
 
 
   +1 from me


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974040#comment-16974040
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 010fb0b9942dc5407d74f1279030805b99a32973 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=010fb0b ]

LUCENE-8920: precommit errors


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974038#comment-16974038
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit fe1653b9385cd6518691b9c397fb512249ca5e78 in lucene-solr's branch 
refs/heads/branch_8x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fe1653b ]

LUCENE-8920: refactor FST binary search


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974037#comment-16974037
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 0f4dcde4d983d7a0c09480dd849eb26bae443528 in lucene-solr's branch 
refs/heads/branch_8x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=0f4dcde ]

LUCENE-8920: remove Arc setters, moving implementations into Arc, or copying 
data into consumers


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974036#comment-16974036
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 4836dfc8b808e7c198673d68df9d4ac0eceab0af in lucene-solr's branch 
refs/heads/branch_8x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=4836dfc ]

LUCENE-8920: encapsulate FST.Arc data


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974043#comment-16974043
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 07101ed8cc140aafd4b2e1bc00841eb7d0cd037c in lucene-solr's branch 
refs/heads/master from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=07101ed ]

LUCENE-8920: CHANGES entry.


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974042#comment-16974042
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 3c9140f24348f8d4e28bcc1a844ea503ed264f78 in lucene-solr's branch 
refs/heads/branch_8x from Adrien Grand
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=3c9140f ]

LUCENE-8920: CHANGES entry.


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974041#comment-16974041
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit 5dd9c4c04bcab9911b9a9f0b092eadc50262ee2c in lucene-solr's branch 
refs/heads/branch_8x from Bruno Roustant
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5dd9c4c ]

LUCENE-8920: Reduce the memory used by direct addressing of arcs (#980)


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974039#comment-16974039
 ] 

ASF subversion and git services commented on LUCENE-8920:
-

Commit e97380ad20d5f6af3a90c37383468678cf6bfcc7 in lucene-solr's branch 
refs/heads/branch_8x from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e97380a ]

LUCENE-8920: Fix bug preventing FST duplicate tails from being shared when 
encoded as array-with-gaps


> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-8920.
--
Fix Version/s: 8.4
   Resolution: Fixed

There were a few conflicts when backporting to branch_8x, so you might want to 
take a second look [~sokolov] [~bruno.roustant] to make sure I did not get 
anything wrong.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9029) Deprecate SloppyMath toRadians/toDegrees in favor of Java Math

2019-11-14 Thread Adrien Grand (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adrien Grand resolved LUCENE-9029.
--
Fix Version/s: 8.4
   Resolution: Fixed

Merged, thanks [~jdconradson].

> Deprecate SloppyMath toRadians/toDegrees in favor of Java Math
> --
>
> Key: LUCENE-9029
> URL: https://issues.apache.org/jira/browse/LUCENE-9029
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jack Conradson
>Priority: Trivial
> Fix For: 8.4
>
> Attachments: LUCENE-9029.patch, LUCENE-9029.patch
>
>
> This change follows a TODO left in SloppyMath to remove toRadians/toDegrees 
> since from Java 9 forward Math toRadians/toDegrees is now identical. Since 
> these methods/constants are public, deprecation messages are added to each 
> one. Internally, in Lucene, all instances of the SloppyMath versions are 
> replaced with the standard Java Math versions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-site] adamwalz commented on issue #6: LUCENE-9015 Add notification to .asf-yaml automated Pelican builds

2019-11-14 Thread GitBox

adamwalz commented on issue #6: LUCENE-9015 Add notification to .asf-yaml 
automated Pelican builds
URL: https://github.com/apache/lucene-site/pull/6#issuecomment-553790038
 
 
   Closing since it looks like automated builds are working now


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-site] adamwalz closed pull request #6: LUCENE-9015 Add notification to .asf-yaml automated Pelican builds

2019-11-14 Thread GitBox

adamwalz closed pull request #6: LUCENE-9015 Add notification to .asf-yaml 
automated Pelican builds
URL: https://github.com/apache/lucene-site/pull/6
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974054#comment-16974054
 ] 

Jan Høydahl commented on LUCENE-9015:
-

Auto builds are now succeeding, thanks [~humbedooh] for helping out with that 
pelican build job!

I just enabled the publish to staging part of .asf.yaml, so the site should 
soon appear on 
[https://lucene.staged.apache.org|https://lucene.staged.apache.org/] 

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Adam Walz (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974059#comment-16974059
 ] 

Adam Walz commented on LUCENE-9015:
---

This is great!

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-13931) Decommission Analytics component/handler

2019-11-14 Thread Mikhail Khludnev (Jira)

Mikhail Khludnev created SOLR-13931:
---

 Summary: Decommission Analytics component/handler
 Key: SOLR-13931
 URL: https://issues.apache.org/jira/browse/SOLR-13931
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SearchComponents - other
Reporter: Mikhail Khludnev






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13931) Decommission Analytics component/handler

2019-11-14 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-13931:

Description: 
Spin off from SOLR-13904 discussion. 
let's:

# Deprecate the analytics component
# Make a migration path to JSON Facets
# Make sure that JSON Facets has all of the necessary functionality (median, 
unique, percentile, etc.)
# Have analytics queries be converted to json facets, and computed that way.
# Remove the Analytics Component backend
# Down the road stop supporting analytics component requests all together (Make 
users switch to json facets themselves)

> Decommission Analytics component/handler
> 
>
> Key: SOLR-13931
> URL: https://issues.apache.org/jira/browse/SOLR-13931
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SearchComponents - other
>Reporter: Mikhail Khludnev
>Priority: Major
>
> Spin off from SOLR-13904 discussion. 
> let's:
> # Deprecate the analytics component
> # Make a migration path to JSON Facets
> # Make sure that JSON Facets has all of the necessary functionality (median, 
> unique, percentile, etc.)
> # Have analytics queries be converted to json facets, and computed that way.
> # Remove the Analytics Component backend
> # Down the road stop supporting analytics component requests all together 
> (Make users switch to json facets themselves)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13904) Make Analytics component aware of timeAllowed

2019-11-14 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13904?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974071#comment-16974071
 ] 

Mikhail Khludnev commented on SOLR-13904:
-

Spawn SOLR-13931. Until it's done, Looking for review. Thanks. 

> Make Analytics component aware of timeAllowed
> -
>
> Key: SOLR-13904
> URL: https://issues.apache.org/jira/browse/SOLR-13904
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: SOLR-13904.patch, SOLR-13904.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974093#comment-16974093
 ] 

Jan Høydahl commented on LUCENE-8987:
-

Milestone reached - ASF builedbot builds and publishes the new website git repo 
to [https://lucene.staged.apache.org|https://lucene.staged.apache.org/] !

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9027) SIMD-based decoding of postings lists

2019-11-14 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974100#comment-16974100
 ] 

Adrien Grand commented on LUCENE-9027:
--

I opened LUCENE-9047 to make our APIs little endian.

I plan to merge the attached pull request early next week if there are no 
objections.

> SIMD-based decoding of postings lists
> -
>
> Key: LUCENE-9027
> URL: https://issues.apache.org/jira/browse/LUCENE-9027
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Adrien Grand
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> [~rcmuir] has been mentioning the idea for quite some time that we might be 
> able to write the decoding logic in such a way that Java would use SIMD 
> instructions. More recently [~paul.masurel] wrote a [blog 
> post|https://fulmicoton.com/posts/bitpacking/] that raises the point that 
> Lucene could still do decode multiple ints at once in a single instruction by 
> packing two ints in a long and we've had some discussions about what we could 
> try in Lucene to speed up the decoding of postings. This made me want to look 
> a bit deeper at what we could do.
> Our current decoding logic reads data in a byte[] and decodes packed integers 
> from it. Unfortunately it doesn't make use of SIMD instructions and looks 
> like 
> [this|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/NaiveByteDecoder.java].
> I confirmed by looking at the generated assembly that if I take an array of 
> integers and shift them all by the same number of bits then Java will use 
> SIMD instructions to shift multiple integers at once. This led me to writing 
> this 
> [implementation|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/SimpleSIMDDecoder.java]
>  that tries as much as possible to shift long sequences of ints by the same 
> number of bits to speed up decoding. It is indeed faster than the current 
> logic we have, up to about 2x faster for some numbers of bits per value.
> Currently the best 
> [implementation|https://github.com/jpountz/decode-128-ints-benchmark/blob/master/src/main/java/jpountz/SIMDDecoder.java]
>  I've been able to come up with combines the above idea with the idea that 
> Paul mentioned in his blog that consists of emulating SIMD from Java by 
> packing multiple integers into a long: 2 ints, 4 shorts or 8 bytes. It is a 
> bit harder to read but gives another speedup on top of the above 
> implementation.
> I have a [JMH 
> benchmark|https://github.com/jpountz/decode-128-ints-benchmark/] available in 
> case someone would like to play with this and maybe even come up with an even 
> faster implementation. It is 2-2.5x faster than our current implementation 
> for most numbers of bits per value. I'm copying results here:
> {noformat}
>  * `readLongs` just reads 2*bitsPerValue longs from the ByteBuffer, it serves 
> as
>a baseline.
>  * `decodeNaiveFromBytes` reads a byte[] and decodes from it. This is what the
>current Lucene codec does.
>  * `decodeNaiveFromLongs` decodes from longs on the fly.
>  * `decodeSimpleSIMD` is a simple implementation that relies on how Java
>recognizes some patterns and uses SIMD instructions.
>  * `decodeSIMD` is a more complex implementation that both relies on the C2
>compiler to generate SIMD instructions and encodes 8 bytes, 4 shorts or
>2 ints in a long in order to decompress multiple values at once.
> Benchmark   (bitsPerValue)  (byteOrder)   
> Mode  Cnt   Score   Error   Units
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   1   LE  
> thrpt5  12.912 ± 0.393  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   1   BE  
> thrpt5  12.862 ± 0.395  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   2   LE  
> thrpt5  13.040 ± 1.162  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   2   BE  
> thrpt5  13.027 ± 0.270  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   3   LE  
> thrpt5  12.409 ± 0.637  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   3   BE  
> thrpt5  12.268 ± 0.947  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   4   LE  
> thrpt5  14.177 ± 2.263  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   4   BE  
> thrpt5  11.457 ± 0.150  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   5   LE  
> thrpt5  10.988 ± 1.179  ops/us
> PackedIntsDecodeBenchmark.decodeNaiveFromBytes   5   BE  
> thrpt5  11.226 ± 0.088  ops/us
> PackedIntsDecodeBenchmark.decodeNaive

[jira] [Created] (LUCENE-9048) Tutorial and docs section missing from the new website

2019-11-14 Thread Jira

Jan Høydahl created LUCENE-9048:
---

 Summary: Tutorial and docs section missing from the new website
 Key: LUCENE-9048
 URL: https://issues.apache.org/jira/browse/LUCENE-9048
 Project: Lucene - Core
  Issue Type: Bug
  Components: general/website
Reporter: Jan Høydahl


See [https://lucene.staged.apache.org/solr/resources.html#tutorials]

The Tutorials and Docuemtation sub sections are missing from this page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9048) Tutorial and docs section missing from the new website

2019-11-14 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974104#comment-16974104
 ] 

Jan Høydahl commented on LUCENE-9048:
-

[~adamwalz] can you take a look at this one? I see the same in local build.

> Tutorial and docs section missing from the new website
> --
>
> Key: LUCENE-9048
> URL: https://issues.apache.org/jira/browse/LUCENE-9048
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/website
>Reporter: Jan Høydahl
>Priority: Major
>
> See [https://lucene.staged.apache.org/solr/resources.html#tutorials]
> The Tutorials and Docuemtation sub sections are missing from this page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346261265
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/PackageManager.java
 ##
 @@ -0,0 +1,415 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.Closeable;
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Scanner;
+
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.packagemanager.SolrPackage.Command;
+import org.apache.solr.packagemanager.SolrPackage.Manifest;
+import org.apache.solr.packagemanager.SolrPackage.Plugin;
+import org.apache.solr.pkg.PackagePluginHolder;
+import org.apache.solr.util.SolrCLI;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.google.common.base.Strings;
+import com.jayway.jsonpath.JsonPath;
+import com.jayway.jsonpath.PathNotFoundException;
+
+/**
+ * Handles most of the management of packages that are already installed in 
Solr.
+ */
+public class PackageManager implements Closeable {
+
+  final String solrBaseUrl;
+  final HttpSolrClient solrClient;
+  final SolrZkClient zkClient;
+
+  private Map> packages = null;
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+
+  public PackageManager(HttpSolrClient solrClient, String solrBaseUrl, String 
zkHost) {
+this.solrBaseUrl = solrBaseUrl;
+this.solrClient = solrClient;
+this.zkClient = new SolrZkClient(zkHost, 3);
+log.info("Done initializing a zkClient instance...");
+  }
+
+  @Override
+  public void close() throws IOException {
+if (zkClient != null) {
+  zkClient.close();
+}
+  }
+
+  public List fetchInstalledPackageInstances() throws 
SolrException {
+log.info("Getting packages from packages.json...");
+List ret = new ArrayList();
+packages = new HashMap>();
+try {
+  Map packagesZnodeMap = null;
+
+  if (zkClient.exists("/packages.json", true) == true) {
+packagesZnodeMap = (Map)getMapper().readValue(
+new String(zkClient.getData("/packages.json", null, null, true), 
"UTF-8"), Map.class).get("packages");
+for (Object packageName: packagesZnodeMap.keySet()) {
+  List pkg = (List)packagesZnodeMap.get(packageName);
+  for (Map pkgVersion: (List)pkg) {
+Manifest manifest = PackageUtils.fetchManifest(solrClient, 
solrBaseUrl, pkgVersion.get("manifest").toString(), 
pkgVersion.get("manifestSHA512").toString());
+List solrplugins = manifest.plugins;
+SolrPackageInstance pkgInstance = new 
SolrPackageInstance(packageName.toString(), null, 
+pkgVersion.get("version").toString(), manifest, solrplugins, 
manifest.parameterDefaults);
+List list = 
packages.containsKey(packageName)? packages.get(packageName): new 
ArrayList();
+list.add(pkgInstance);
+packages.put(packageName.toString(), list);
+ret.add(pkgInstance);
+  }
+}
+  }
+} catch (Exception e) {
+  throw new SolrException(ErrorCode.BAD_REQUEST, e);
+}
+log.info("Got packages: "+ret);
+return ret;
+  }
+
+  public Map getPackagesDeployed(String 
collection) {
+String paramsJson = 
PackageUtils.getJsonStringFromUrl(solrClient.getHttpClient(), 
solrBaseUrl+"/api/collections/"+collection+"/config/params/PKG_VERSIONS?omitHeader=true");
 
 Review comment:
   Makes the code look ugly, but done this anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to

[GitHub] [lucene-solr] chatman commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553847960
 
 
   > do you limit the calls to /api/collections/${collection}/config* 
/api/collections/${collection}/schema* ?
   
   No, any http request to Solr is supported. If, in future, there's a 
configset API that accepts commands for registering components, then those can 
be used as well. I don't think a whitelisting of paths is a good idea.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974166#comment-16974166
 ] 

Jan Høydahl commented on LUCENE-9015:
-

Guess last part of this Jira is to create an asf-site branch, and perhaps write 
a python script that merges all commits from asf-staging over to asf-site.

Then we can close this Jira (actual publish to prod will happen in LUCENE-9034 

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-13932) Review directory locking for Blob interactions

2019-11-14 Thread Ilan Ginzburg (Jira)

Ilan Ginzburg created SOLR-13932:


 Summary: Review directory locking for Blob interactions
 Key: SOLR-13932
 URL: https://issues.apache.org/jira/browse/SOLR-13932
 Project: Solr
  Issue Type: Sub-task
Reporter: Ilan Ginzburg


Review resolution of local index directory content vs Blob copy.

There has been wrong understanding of following line acquiring a lock on index 
directory.

 {{solrCore.getDirectoryFactory().get(indexDirPath, 
DirectoryFactory.DirContext.DEFAULT, 
solrCore.getSolrConfig().indexConfig.lockType);}}

>From Yonik:

_A couple things about Directory locking the locks were only ever to 
prevent more than one IndexWriter from trying to modify the same index. The 
IndexWriter grabs a write lock once when it is created and does not release it 
until it is closed._ 

_Directories are not locked on acquisition of the Directory from the 
DirectoryFactory. See the IndexWriter constructor, where the lock is explicitly 
grabbed._

Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes as 
relevant.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13932) Review directory locking for Blob interactions

2019-11-14 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974191#comment-16974191
 ] 

Ilan Ginzburg commented on SOLR-13932:
--

Commented code in {{ServerSideMetadata}} constructor regarding 
{{DirectoryReader.listCommits()}} mentions "But in blob context we serialize 
commits and pulls by proper locking therefore we should be good here". I 
believe this is no longer true (a previous Salesforce version of that code used 
to serialize updates). We do not serialize commits but let Lucene do it the way 
it wants (we serialize pushes to Blob).

OTOH, I don't see why we try to detect inconsistency using past files that are 
no longer part of the commit point. If we rely on such detections, we're at 
risk because files of past commit points can be deleted at any time. Our 
strategy must make sure (I believe it does) that local is always consistent, or 
local should be deleted and the Blob version adopted instead. The only time 
where local is "a source of truth" over blob is after indexing. Consistency in 
such a case is guaranteed by verifying that local was consistent with Blob 
before indexing started (blob being source of truth), and that Blob hasn't 
changed by the time indexing finished and got pushed back to Blob (allowing 
parallel indexing from given node/replica, this is done by the conditional 
update into ZK).

> Review directory locking for Blob interactions
> --
>
> Key: SOLR-13932
> URL: https://issues.apache.org/jira/browse/SOLR-13932
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Ilan Ginzburg
>Priority: Major
>
> Review resolution of local index directory content vs Blob copy.
> There has been wrong understanding of following line acquiring a lock on 
> index directory.
>  {{solrCore.getDirectoryFactory().get(indexDirPath, 
> DirectoryFactory.DirContext.DEFAULT, 
> solrCore.getSolrConfig().indexConfig.lockType);}}
> From Yonik:
> _A couple things about Directory locking the locks were only ever to 
> prevent more than one IndexWriter from trying to modify the same index. The 
> IndexWriter grabs a write lock once when it is created and does not release 
> it until it is closed._ 
> _Directories are not locked on acquisition of the Directory from the 
> DirectoryFactory. See the IndexWriter constructor, where the lock is 
> explicitly grabbed._
> Review CorePushPull#pullUpdateFromBlob, ServerSideMetadata and other classes 
> as relevant.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974198#comment-16974198
 ] 

Michael Sokolov commented on LUCENE-8920:
-

I scanned the diff of all the commits together, and I didn't see any issues, 
but boy that is a big change now. Thanks for handling the merge, [~jpountz]

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974198#comment-16974198
 ] 

Michael Sokolov edited comment on LUCENE-8920 at 11/14/19 12:38 PM:


I scanned the diff of all the commits together, and I didn't see any issues, 
but boy that is a big change now. Thanks for handling the merge, [~jpountz], 
and nice work on saving both space and time, [~bruno.roustant]!


was (Author: sokolov):
I scanned the diff of all the commits together, and I didn't see any issues, 
but boy that is a big change now. Thanks for handling the merge, [~jpountz]

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13930:
--
Summary: Fix failing TestKoreanTokenizer test in Gradle build  (was: Fix 
failing TestKoreanTokenizer test)

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974200#comment-16974200
 ] 

Erick Erickson commented on SOLR-13930:
---

Sorry, I caused you extra work, this is in the Gradle build, not the regular 
Ant build, it works fine in the regular Ant build.

I changed to the title to make this more plain.

Thanks for looking and again sorry for the ambiguity

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13923) Test target (task?) should fail when no tests run in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13923:
--
Summary: Test target (task?) should fail when no tests run in Gradle build  
(was: Test target (task?) should fail when no tests run)

> Test target (task?) should fail when no tests run in Gradle build
> -
>
> Key: SOLR-13923
> URL: https://issues.apache.org/jira/browse/SOLR-13923
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Build
>Reporter: Michael Sokolov
>Priority: Minor
>
> With the ant build if you try to test a nonexistent test case or method 
> ({{-Dtestcase=NoSuchThing}}, the build will fail; this is pretty helpful if 
> you make a lot of typos or forget the names of things. According to [~dweiss] 
> we can get this behavior in gradle by listening to the test results and 
> failing if no tests ran.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13929) Reconcile parallel licenses and licenses_gradle trees in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13929:
--
Summary: Reconcile parallel licenses and licenses_gradle trees in Gradle 
build  (was: Reconcile parallel licenses and licenses_gradle trees)

> Reconcile parallel licenses and licenses_gradle trees in Gradle build
> -
>
> Key: SOLR-13929
> URL: https://issues.apache.org/jira/browse/SOLR-13929
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Priority: Major
>
> I had a hard time making Gradle and Ant play nice together when they shared 
> the same license directory. Temporarily there are two, license and 
> license_gradle, both in the lucene and solr trees. When we remove Ant, we 
> need to reconcile this, probably by removing the two "license" directories.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974203#comment-16974203
 ] 

Michael Sokolov commented on SOLR-13930:


I'll just note that {{TestJapaneseTokenizerTest}} does pretty much exactly the 
same thing -- yet it passes?

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974215#comment-16974215
 ] 

Ignacio Vera commented on LUCENE-8997:
--

I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%|
{code}

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974218#comment-16974218
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit d9f41f8a5a31e7dd8f4ccee729d479ce07175c1a in lucene-solr's branch 
refs/heads/master from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=d9f41f8 ]

SOLR-13662: Package manager (CLI)


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 12h 50m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman closed pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman closed pull request #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] chatman commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553874860
 
 
   Merged, thanks. 
https://issues.apache.org/jira/browse/SOLR-13662?focusedCommentId=16974218&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16974218
   
   Thanks for all your reviews!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974238#comment-16974238
 ] 

Michael Sokolov commented on SOLR-13930:


I was able to run

./gradlew lucene:lucene-analyzers:lucene-analyzers-nori:test

successfully. What command/branch did you see the failure with, 
[~erickerickson]?

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974239#comment-16974239
 ] 

Adrien Grand commented on LUCENE-8997:
--

I'm unsure about keeping dimensions empty: it works well if your index has only 
lines or only points since all points will have a value of 0 for certain 
dimensions. But if the index mixes triangles and points, then this could 
actually hurt?

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974241#comment-16974241
 ] 

Adrien Grand commented on LUCENE-8997:
--

I guess it could still work if we indexed this dimension, but I don't think 
this is the right trade-off.

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974215#comment-16974215
 ] 

Ignacio Vera edited comment on LUCENE-8997 at 11/14/19 1:19 PM:


I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|244.7s|250.7s|-2%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.14| 0%|
{code}


was (Author: ivera):
I would like to raise this issue again as I make a small improvement. I realise 
that for points I do not need to add the point information for data dimensions, 
therefore I can just leave dimensions 5 and 6 empty. For BKD tree leaves that 
only contain points it means they will compress very well.

I have run the Lucene geo benchmarks for LatLonShape and I got a reduction of 
the index size of 30%!

 
{code}
||Approach||Index time (sec)||Force merge time (sec)||Index size (GB)||Reader 
heap (MB)||

          ||Dev||Base||Diff ||Dev  ||Base  ||diff   
||Dev||Base||Diff||Dev||Base||Diff ||

|shapes|260.8s|264.2s|-1%|0.0s|0.0s| 0%|0.89|1.27|-30%|1.14|1.78|-36%|
{code}

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974200#comment-16974200
 ] 

Erick Erickson edited comment on SOLR-13930 at 11/14/19 1:28 PM:
-

Sorry, I caused you extra work, this is in the Ant test in the Gradle_8 branch, 
not the regular Ant build on master, there it works fine.

I changed to the title to make this more plain.

Thanks for looking and again sorry for the ambiguity


was (Author: erickerickson):
Sorry, I caused you extra work, this is in the Gradle build, not the regular 
Ant build, it works fine in the regular Ant build.

I changed to the title to make this more plain.

Thanks for looking and again sorry for the ambiguity

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974250#comment-16974250
 ] 

Adrien Grand commented on LUCENE-8920:
--

Thanks for checking [~sokolov]!

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Fix failing TestKoreanTokenizer test in Gradle build

2019-11-14 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974251#comment-16974251
 ] 

Erick Erickson commented on SOLR-13930:
---

Dear Lord, sometime you'd think I'd learn to put in complete details and save 
others wasting time when they try to help. Sorry about that.

The _ant_ build fails, not the Gradle test:

ant -Dtestcase=TestKoreanTokenizer test

Oddly, TestJapaneseToknizer succeeds when run under Ant.

And thanks to all who are looking into these things. I'm trying to record 
things as I find them and so descriptions may be fragmentary I'm afraid.

 

 

> Fix failing TestKoreanTokenizer test in Gradle build
> 
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8997) Add type of triangle info to ShapeField encoding

2019-11-14 Thread Ignacio Vera (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974254#comment-16974254
 ] 

Ignacio Vera commented on LUCENE-8997:
--

I see your point, I revert that change.

> Add type of triangle info to ShapeField encoding
> 
>
> Key: LUCENE-8997
> URL: https://issues.apache.org/jira/browse/LUCENE-8997
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> We are currently encoding three type of triangle in ShapeField:
>  * POINT: all three coordinates are equal
>  * LINE: two coordinates are equal
>  * TRIANGLE: all coordinates are different
> Because we still have two unused bits, it might be worthy to encode this 
> information in those two bits as follows:
>  * 0 0 : Unknown so this is an index created before adding this information. 
> We can compute in this case the information while decoding for backwards 
> compatibility.
>  * 1 0: The encoded triangle is a POINT
>  * 0 1: The encoded triangle is a LINE
>  * 1 1: The encoded triangle is a TRIANGLE
> We can later leverage this information so we don't need to decode all 
> dimensions in case of POINT and LINE and we are currently computing in some 
> of the methods ithe type of triangle we are dealing with, This will go as 
> well.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Attachment: SOLR-13817-master.patch

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Erick Erickson (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson updated SOLR-13930:
--
Summary: Running TestKoreanTokenizer with Ant fails  in gradle_8 build  
(was: Fix failing TestKoreanTokenizer test in Gradle build)

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974256#comment-16974256
 ] 

Andrzej Bialecki commented on SOLR-13817:
-

Patch relative to master. It removes all traces of {{LRUCache, LFUCache, 
FastLRUCache}} from sources, configs and documentation and replaces all cache 
configs with {{CaffeineCache}}.

Tests are still passing, which is nice ;)

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9020) Find a way to publish Solr RefGuide without checking into git

2019-11-14 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974264#comment-16974264
 ] 

Uwe Schindler commented on LUCENE-9020:
---

Sorry, was too busy. Here it is: INFRA-19439

> Find a way to publish Solr RefGuide without checking into git
> -
>
> Key: LUCENE-9020
> URL: https://issues.apache.org/jira/browse/LUCENE-9020
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Priority: Major
>
> Currently we check in all versions of RefGuide (hundreds of small html files) 
> into svn to publish as part of the site. With new site we should find a 
> smoother way to do this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox

jpountz commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-553893391
 
 
   This is a bit too esoteric for lucene/core in my opinion, would it work for 
you if we had it in lucene/sandbox?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974272#comment-16974272
 ] 

ASF subversion and git services commented on SOLR-13662:


Commit 6edbda74291fa9fabb5e6cdc1141e799b738f5ef in lucene-solr's branch 
refs/heads/branch_8x from Ishan Chattopadhyaya
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6edbda7 ]

SOLR-13662: Package manager (CLI)


> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mkhludnev opened a new pull request #1011: LUCENE-9031: Just highlight term intervals and its' combinations.

2019-11-14 Thread GitBox

mkhludnev opened a new pull request #1011: LUCENE-9031: Just highlight term 
intervals and its' combinations.
URL: https://github.com/apache/lucene-solr/pull/1011
 
 
   
   
   
   # Description
   
   Please provide a short description of the changes you're making with this 
pull request.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I am authorized to contribute this code to the ASF and have removed 
any code I do not have a license to distribute.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `ant precommit` and the appropriate test suite.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9031) UnsupportedOperationException on highlighting Interval Query

2019-11-14 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9031:
-
Attachment: LUCENE-9031.patch
Status: Patch Available  (was: Patch Available)

Starting from scratch, limiting by simple term intervals only
https://github.com/apache/lucene-solr/pull/1011

> UnsupportedOperationException on highlighting Interval Query
> 
>
> Key: LUCENE-9031
> URL: https://issues.apache.org/jira/browse/LUCENE-9031
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: modules/queries
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.4
>
> Attachments: LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, 
> LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch, LUCENE-9031.patch
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When UnifiedHighlighter highlights Interval Query it encounters 
> UnsupportedOperationException. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Attachment: SOLR-13817-8x.patch

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974283#comment-16974283
 ] 

Andrzej Bialecki commented on SOLR-13817:
-

Patch for branch_8x to add @deprecation tags and switch the default config 
(when {{class=...}} attribute is missing) to {{CaffeineCache}}.

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13662) Package manager CLI

2019-11-14 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974298#comment-16974298
 ] 

Ishan Chattopadhyaya commented on SOLR-13662:
-

I'll make the ref guide changes in another PR soon.

> Package manager CLI
> ---
>
> Key: SOLR-13662
> URL: https://issues.apache.org/jira/browse/SOLR-13662
> Project: Solr
>  Issue Type: Sub-task
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Noble Paul
>Assignee: Ishan Chattopadhyaya
>Priority: Major
> Attachments: plugin-cli.png
>
>  Time Spent: 13h 10m
>  Remaining Estimate: 0h
>
> Design details and usage details are here: 
> https://docs.google.com/document/d/15b3m3i3NFDKbhkhX_BN0MgvPGZaBj34TKNF2-UNC3U8/edit?ts=5d86a8ad#



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Pinkesh Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974300#comment-16974300
 ] 

Pinkesh Sharma commented on SOLR-13930:
---

Hey Erick, I tried building this with gradle, and seems like the tests and the 
build is passing.



I was running the build here:

./gradlew lucene:lucene-analyzers:lucene-analyzers-nori:test

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] pgerber commented on issue #374: [SOLR-12334] Improve detection of recreated lockfiles

2019-11-14 Thread GitBox

pgerber commented on issue #374: [SOLR-12334] Improve detection of recreated 
lockfiles
URL: https://github.com/apache/lucene-solr/pull/374#issuecomment-553917907
 
 
   Closing, I don't have any intention to still do this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] pgerber closed pull request #374: [SOLR-12334] Improve detection of recreated lockfiles

2019-11-14 Thread GitBox

pgerber closed pull request #374: [SOLR-12334] Improve detection of recreated 
lockfiles
URL: https://github.com/apache/lucene-solr/pull/374
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974386#comment-16974386
 ] 

David Smiley commented on LUCENE-8920:
--

I want to confirm we have back-compat handled.  Do we?  A very quick look at 
the code shows we bumped the FST version and I see the FST's constructor 
accepts the previous version.  But will _it actually work_ -- will this Lucene 
8.4 code read FSTs written in previous indexes correctly?  I know we have some 
back-compat indices but I don't recall when that is validated (on each test or 
only on release?)

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974400#comment-16974400
 ] 

Bruno Roustant commented on LUCENE-8920:


{quote}I want to confirm we have back-compat handled. Do we?
{quote}
I'm pretty sure we are back-compatible. We introduce a new node type based on a 
new value of the node flags. The new code should read previous FST, and should 
write new FSTs with new direct-addressing nodes. That said I'm interested to 
know when it is validated automatically too.

 

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9049) Remove FST cachedRootArcs now redundant with direct-addressing

2019-11-14 Thread Bruno Roustant (Jira)

Bruno Roustant created LUCENE-9049:
--

 Summary: Remove FST cachedRootArcs now redundant with 
direct-addressing
 Key: LUCENE-9049
 URL: https://issues.apache.org/jira/browse/LUCENE-9049
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Bruno Roustant


With LUCENE-8920 FST most often encodes top level nodes with direct-addressing 
(instead of array for binary search). This probably made the cachedRootArcs 
redundant. So they should be removed, and this will reduce the code.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974408#comment-16974408
 ] 

Bruno Roustant commented on LUCENE-8920:


{quote}There were a few conflicts when backporting to branch_8x, so you might 
want to take a second look
{quote}
I verified also branch_8x, that seems good to me.

I created the follow up item for removing cachedRootArcs LUCENE-9049.

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8920) Reduce size of FSTs due to use of direct-addressing encoding

2019-11-14 Thread Michael Sokolov (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974410#comment-16974410
 ] 

Michael Sokolov commented on LUCENE-8920:
-

I had tested with the previous version of this patch, and yes I also believe 
this preserves the same back-compat since the old arc encoding is read as 
before, but there is no automated testing to verify. It would be wise to run 
some manual spot-checking. We could eg build an "old" index with luceneutil and 
then run its tests with that index after upping the code. Or any test that runs 
on an existing index should do - is there a more convenient one? 

> Reduce size of FSTs due to use of direct-addressing encoding 
> -
>
> Key: LUCENE-8920
> URL: https://issues.apache.org/jira/browse/LUCENE-8920
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Minor
> Fix For: 8.4
>
> Attachments: TestTermsDictRamBytesUsed.java
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> Some data can lead to worst-case ~4x RAM usage due to this optimization. 
> Several ideas were suggested to combat this on the mailing list:
> bq. I think we can improve thesituation here by tracking, per-FST instance, 
> the size increase we're seeing while building (or perhaps do a preliminary 
> pass before building) in order to decide whether to apply the encoding. 
> bq. we could also make the encoding a bit more efficient. For instance I 
> noticed that arc metadata is pretty large in some cases (in the 10-20 bytes) 
> which make gaps very costly. Associating each label with a dense id and 
> having an intermediate lookup, ie. lookup label -> id and then id->arc offset 
> instead of doing label->arc directly could save a lot of space in some cases? 
> Also it seems that we are repeating the label in the arc metadata when 
> array-with-gaps is used, even though it shouldn't be necessary since the 
> label is implicit from the address?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

janhoy commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553970009
 
 
   >I just didn't feel like tackling the bouncy castle dependency at the moment
   Java 11 has stong crypto included ootb so should be easier. We can add that 
in master and not backport to 8.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox

bruno-roustant commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-553972639
 
 
   Sandbox is fine for me yes.
   I'll push a commit soon to fix the precommit here and I'll move it to 
sandbox.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8983) PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase

2019-11-14 Thread Bruno Roustant (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974420#comment-16974420
 ] 

Bruno Roustant commented on LUCENE-8983:


[~klaporte] did you try this PhraseWildcardQuery? Do you have some feedback 
about it?

We will probably move it to lucene/sandbox.

> PhraseWildcardQuery - new query to control and optimize wildcard expansions 
> in phrase
> -
>
> Key: LUCENE-8983
> URL: https://issues.apache.org/jira/browse/LUCENE-8983
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A generalized version of PhraseQuery, built with one or more MultiTermQuery 
> that provides term expansions for multi-terms (one of the expanded terms must 
> match).
> Its main advantage is to control the total number of expansions across all 
> MultiTermQuery and across all segments.
>  This query is similar to MultiPhraseQuery, but it handles, controls and 
> optimizes the multi-term expansions.
>  
>  This query is equivalent to building an ordered SpanNearQuery with a list of 
> SpanTermQuery and SpanMultiTermQueryWrapper.
>  But it optimizes the multi-term expansions and the segment accesses.
>  It first resolves the single-terms to early stop if some does not match. 
> Then it expands each multi-term sequentially, stopping immediately if one 
> does not match. It detects the segments that do not match to skip them for 
> the next expansions. This often avoid expanding the other multi-terms on some 
> or even all segments. And finally it controls the total number of expansions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy edited a comment on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

janhoy edited a comment on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553970009
 
 
   >I just didn't feel like tackling the bouncy castle dependency at the moment
   
   Java 11 has stong crypto included ootb so should be easier. We can add that 
in master and not backport to 8.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

janhoy commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346426522
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT,

[GitHub] [lucene-solr] janhoy commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

janhoy commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346427550
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT,

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346429399
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT,

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346430220
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/util/PackageTool.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.util;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.OptionBuilder;
+import org.apache.http.impl.client.CloseableHttpClient;
+import org.apache.logging.log4j.Level;
+import org.apache.logging.log4j.core.config.Configurator;
+import org.apache.lucene.util.SuppressForbidden;
+import org.apache.solr.client.solrj.impl.HttpClientUtil;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.packagemanager.PackageManager;
+import org.apache.solr.packagemanager.PackageUtils;
+import org.apache.solr.packagemanager.RepositoryManager;
+import org.apache.solr.packagemanager.SolrPackageInstance;
+import org.apache.solr.util.SolrCLI.StatusTool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+@SuppressForbidden(reason = "Need to use System.out.println() instead of 
log4j/slf4j for cleaner output")
+public class PackageTool extends SolrCLI.ToolBase {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  @SuppressForbidden(reason = "Need to turn off logging, and SLF4J doesn't 
seem to provide for a way.")
+  public PackageTool() {
+// Need a logging free, clean output going through to the user.
+Configurator.setRootLevel(Level.OFF);
+  }
+
+  @Override
+  public String getName() {
+return "package";
+  }
+
+  public static String solrUrl = null;
+  public static String solrBaseUrl = null;
+  public PackageManager packageManager;
+  public RepositoryManager repositoryManager;
+
+  @Override
+  protected void runImpl(CommandLine cli) throws Exception {
+try {
+  solrUrl = 
cli.getOptionValues("solrUrl")[cli.getOptionValues("solrUrl").length-1];
+  solrBaseUrl = solrUrl.replaceAll("\\/solr$", ""); // strip out ending 
"/solr"
+  log.info("Solr url: "+solrUrl+", solr base url: "+solrBaseUrl);
+  String zkHost = getZkHost(cli);
+
+  log.info("ZK: "+zkHost);
+  String cmd = cli.getArgList().size() == 0? "help": cli.getArgs()[0];
+
+  try (HttpSolrClient solrClient = new 
HttpSolrClient.Builder(solrBaseUrl).build()) {
+if (cmd != null) {
+  packageManager = new PackageManager(solrClient, solrBaseUrl, 
zkHost); 
+  try {
+repositoryManager = new RepositoryManager(solrClient, 
packageManager);
+
+switch (cmd) {
+  case "add-repo":
+repositoryManager.addRepository(cli.getArgs()[1], 
cli.getArgs()[2]);
+break;
+  case "list-installed":
+packageManager.listInstalled();
+break;
+  case "list-available":
+repositoryManager.listAvailable();
+break;
+  case "list-deployed":
+if (cli.hasOption('c')) {
+  String collection = cli.getArgs()[1];
+  Map packages = 
packageManager.getPackagesDeployed(collection);
+  PackageUtils.printGreen("Packages deployed on " + collection 
+ ":");
+  for (String packageName: packages.keySet()) {
+PackageUtils.printGreen("\t" + packages.get(packageName)); 

+  }
+} else {
+  String packageName = cli.getArgs()[1];
+  Map deployedCollections = 
packageManager.getDeployedCollections(packageName);
+  PackageUtils.printGreen("Collections on which package " + 
packageName + " was deployed:");
+  for (String collection: deployedCollections.keySet()) {
+PackageUtils.printGreen("\t" + collection + 
"("+packageName+":"+deployedCollections.get(collection)+")");
+  }
+}
+break;
+

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346429995
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT,

[GitHub] [lucene-solr] chatman commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346430128
 
 

 ##
 File path: solr/core/src/java/org/apache/solr/util/PackageTool.java
 ##
 @@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.util;
+
+import java.lang.invoke.MethodHandles;
+import java.util.Map;
+
+import org.apache.commons.cli.CommandLine;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.OptionBuilder;
+import org.apache.http.impl.client.CloseableHttpClient;
+import org.apache.logging.log4j.Level;
+import org.apache.logging.log4j.core.config.Configurator;
+import org.apache.lucene.util.SuppressForbidden;
+import org.apache.solr.client.solrj.impl.HttpClientUtil;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.packagemanager.PackageManager;
+import org.apache.solr.packagemanager.PackageUtils;
+import org.apache.solr.packagemanager.RepositoryManager;
+import org.apache.solr.packagemanager.SolrPackageInstance;
+import org.apache.solr.util.SolrCLI.StatusTool;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+@SuppressForbidden(reason = "Need to use System.out.println() instead of 
log4j/slf4j for cleaner output")
+public class PackageTool extends SolrCLI.ToolBase {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  @SuppressForbidden(reason = "Need to turn off logging, and SLF4J doesn't 
seem to provide for a way.")
+  public PackageTool() {
+// Need a logging free, clean output going through to the user.
+Configurator.setRootLevel(Level.OFF);
+  }
+
+  @Override
+  public String getName() {
+return "package";
+  }
+
+  public static String solrUrl = null;
+  public static String solrBaseUrl = null;
+  public PackageManager packageManager;
+  public RepositoryManager repositoryManager;
+
+  @Override
+  protected void runImpl(CommandLine cli) throws Exception {
+try {
+  solrUrl = 
cli.getOptionValues("solrUrl")[cli.getOptionValues("solrUrl").length-1];
+  solrBaseUrl = solrUrl.replaceAll("\\/solr$", ""); // strip out ending 
"/solr"
+  log.info("Solr url: "+solrUrl+", solr base url: "+solrBaseUrl);
+  String zkHost = getZkHost(cli);
+
+  log.info("ZK: "+zkHost);
+  String cmd = cli.getArgList().size() == 0? "help": cli.getArgs()[0];
+
+  try (HttpSolrClient solrClient = new 
HttpSolrClient.Builder(solrBaseUrl).build()) {
+if (cmd != null) {
+  packageManager = new PackageManager(solrClient, solrBaseUrl, 
zkHost); 
+  try {
+repositoryManager = new RepositoryManager(solrClient, 
packageManager);
+
+switch (cmd) {
+  case "add-repo":
+repositoryManager.addRepository(cli.getArgs()[1], 
cli.getArgs()[2]);
+break;
+  case "list-installed":
+packageManager.listInstalled();
+break;
+  case "list-available":
+repositoryManager.listAvailable();
+break;
+  case "list-deployed":
+if (cli.hasOption('c')) {
+  String collection = cli.getArgs()[1];
+  Map packages = 
packageManager.getPackagesDeployed(collection);
+  PackageUtils.printGreen("Packages deployed on " + collection 
+ ":");
+  for (String packageName: packages.keySet()) {
+PackageUtils.printGreen("\t" + packages.get(packageName)); 

+  }
+} else {
+  String packageName = cli.getArgs()[1];
+  Map deployedCollections = 
packageManager.getDeployedCollections(packageName);
+  PackageUtils.printGreen("Collections on which package " + 
packageName + " was deployed:");
+  for (String collection: deployedCollections.keySet()) {
+PackageUtils.printGreen("\t" + collection + 
"("+packageName+":"+deployedCollections.get(collection)+")");
+  }
+}
+break;
+

[GitHub] [lucene-solr] chatman commented on issue #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

chatman commented on issue #994: SOLR-13662: Package Manager (CLI)
URL: https://github.com/apache/lucene-solr/pull/994#issuecomment-553978870
 
 
   Sounds good. If we get GPG support in JDK or some decent library, it will be 
great.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-10317) Solr Nightly Benchmarks

2019-11-14 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-10317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya reassigned SOLR-10317:
---

Assignee: Ishan Chattopadhyaya

> Solr Nightly Benchmarks
> ---
>
> Key: SOLR-10317
> URL: https://issues.apache.org/jira/browse/SOLR-10317
> Project: Solr
>  Issue Type: Task
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>  Labels: gsoc2017, mentor
> Attachments: 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks-FINAL-PROPOSAL.pdf, 
> Narang-Vivek-SOLR-10317-Solr-Nightly-Benchmarks.docx, SOLR-10317.patch, 
> SOLR-10317.patch, Screenshot from 2017-07-30 20-30-05.png, 
> changes-lucene-20160907.json, changes-solr-20160907.json, managed-schema, 
> solrconfig.xml
>
>
> Currently hosted at: http://212.47.242.214/MergedViewCloud.html
> 
> Solr needs nightly benchmarks reporting. Similar Lucene benchmarks can be 
> found here, https://home.apache.org/~mikemccand/lucenebench/.
> Preferably, we need:
> # A suite of benchmarks that build Solr from a commit point, start Solr 
> nodes, both in SolrCloud and standalone mode, and record timing information 
> of various operations like indexing, querying, faceting, grouping, 
> replication etc.
> # It should be possible to run them either as an independent suite or as a 
> Jenkins job, and we should be able to report timings as graphs (Jenkins has 
> some charting plugins).
> # The code should eventually be integrated in the Solr codebase, so that it 
> never goes out of date.
> There is some prior work / discussion:
> # https://github.com/shalinmangar/solr-perf-tools (Shalin)
> # https://github.com/chatman/solr-upgrade-tests/blob/master/BENCHMARKS.md 
> (Ishan/Vivek)
> # SOLR-2646 & SOLR-9863 (Mark Miller)
> # https://home.apache.org/~mikemccand/lucenebench/ (Mike McCandless)
> # https://github.com/lucidworks/solr-scale-tk (Tim Potter)
> There is support for building, starting, indexing/querying and stopping Solr 
> in some of these frameworks above. However, the benchmarks run are very 
> limited. Any of these can be a starting point, or a new framework can as well 
> be used. The motivation is to be able to cover every functionality of Solr 
> with a corresponding benchmark that is run every night.
> Proposing this as a GSoC 2017 project. I'm willing to mentor, and I'm sure 
> [~shalinmangar] and [~markrmil...@gmail.com] would help here.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] janhoy commented on a change in pull request #994: SOLR-13662: Package Manager (CLI)

2019-11-14 Thread GitBox

janhoy commented on a change in pull request #994: SOLR-13662: Package Manager 
(CLI)
URL: https://github.com/apache/lucene-solr/pull/994#discussion_r346436605
 
 

 ##
 File path: 
solr/core/src/java/org/apache/solr/packagemanager/RepositoryManager.java
 ##
 @@ -0,0 +1,328 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.packagemanager;
+
+import static org.apache.solr.packagemanager.PackageUtils.getMapper;
+
+import java.io.IOException;
+import java.io.UnsupportedEncodingException;
+import java.lang.invoke.MethodHandles;
+import java.net.MalformedURLException;
+import java.net.URL;
+import java.nio.ByteBuffer;
+import java.nio.file.Path;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.stream.Collectors;
+
+import org.apache.commons.io.FileUtils;
+import org.apache.commons.io.IOUtils;
+import org.apache.lucene.util.Version;
+import org.apache.solr.client.solrj.SolrRequest;
+import org.apache.solr.client.solrj.SolrServerException;
+import org.apache.solr.client.solrj.impl.HttpSolrClient;
+import org.apache.solr.client.solrj.request.V2Request;
+import org.apache.solr.client.solrj.request.beans.Package;
+import org.apache.solr.client.solrj.response.V2Response;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.SolrException.ErrorCode;
+import org.apache.solr.common.cloud.SolrZkClient;
+import org.apache.solr.core.BlobRepository;
+import org.apache.solr.packagemanager.SolrPackage.Artifact;
+import org.apache.solr.packagemanager.SolrPackage.SolrPackageRelease;
+import org.apache.solr.pkg.PackageAPI;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+/**
+ * Handles most of the management of repositories and packages present in 
external repositories.
+ */
+public class RepositoryManager {
+
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+
+  final private PackageManager packageManager;
+
+  public static final String systemVersion = Version.LATEST.toString();
+
+  final HttpSolrClient solrClient;
+
+  public RepositoryManager(HttpSolrClient solrClient, PackageManager 
packageManager) {
+this.packageManager = packageManager;
+this.solrClient = solrClient;
+  }
+
+  public List getPackages() {
+List list = new ArrayList<>(getPackagesMap().values());
+Collections.sort(list);
+return list;
+  }
+
+  /**
+   * Get a map of package name to {@link SolrPackage} objects
+   */
+  public Map getPackagesMap() {
+Map packagesMap = new HashMap<>();
+for (PackageRepository repository: getRepositories()) {
+  packagesMap.putAll(repository.getPackages());
+}
+
+return packagesMap;
+  }
+
+  /**
+   * List of added repositories
+   */
+  public List getRepositories() {
+// TODO: Instead of fetching again and again, we should look for caching 
this
+PackageRepository items[];
+try {
+  items = 
getMapper().readValue(getRepositoriesJson(packageManager.zkClient), 
DefaultPackageRepository[].class);
+} catch (IOException | KeeperException | InterruptedException e) {
+  throw new SolrException(ErrorCode.SERVER_ERROR, e);
+}
+List repositories = Arrays.asList(items);
+
+for (PackageRepository updateRepository: repositories) {
+  updateRepository.refresh();
+}
+
+return repositories;
+  }
+
+  /**
+   * Add a repository to Solr
+   */
+  public void addRepository(String name, String uri) throws KeeperException, 
InterruptedException, MalformedURLException, IOException {
+String existingRepositoriesJson = 
getRepositoriesJson(packageManager.zkClient);
+log.info(existingRepositoriesJson);
+
+List repos = getMapper().readValue(existingRepositoriesJson, List.class);
+repos.add(new DefaultPackageRepository(name, uri));
+if (packageManager.zkClient.exists("/repositories.json", true) == false) {
+  packageManager.zkClient.create("/repositories.json", 
getMapper().writeValueAsString(repos).getBytes("UTF-8"), CreateMode.PERSISTENT,

[jira] [Created] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)

Ishan Chattopadhyaya created SOLR-13933:
---

 Summary: Cluster mode Stress test suite 
 Key: SOLR-13933
 URL: https://issues.apache.org/jira/browse/SOLR-13933
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Ishan Chattopadhyaya
Assignee: Ishan Chattopadhyaya


We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13933:

Description: 
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance
# Validate the accuracy of potential improvements

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks

  was:
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks


> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly and help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13817) Deprecate legacy SolrCache implementations

2019-11-14 Thread Andrzej Bialecki (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki updated SOLR-13817:

Fix Version/s: 8.4

> Deprecate legacy SolrCache implementations
> --
>
> Key: SOLR-13817
> URL: https://issues.apache.org/jira/browse/SOLR-13817
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Assignee: Andrzej Bialecki
>Priority: Major
> Fix For: 8.4
>
> Attachments: SOLR-13817-8x.patch, SOLR-13817-master.patch
>
>
> Now that SOLR-8241 has been committed I propose to deprecate other cache 
> implementations in 8x and remove them altogether from 9.0, in order to reduce 
> confusion and maintenance costs.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974486#comment-16974486
 ] 

Ishan Chattopadhyaya commented on SOLR-13933:
-

Will attempt to have the first cut of this in a week.

> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly and help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-13933) Cluster mode Stress test suite

2019-11-14 Thread Ishan Chattopadhyaya (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-13933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ishan Chattopadhyaya updated SOLR-13933:

Description: 
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly, publish results publicly, so as to help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance
# Validate the accuracy of potential improvements

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks

  was:
We need a stress test harness based on 10s or 100s of nodes, 1000s of 
collection API operations, overseer operations etc. This suite should run 
nightly and help with:
# Uncover stability problems
# Benchmarking (timings, resource metrics etc.) on collection operations
# Indexing/querying performance
# Validate the accuracy of potential improvements

References:
SOLR-10317
https://github.com/lucidworks/solr-scale-tk
https://github.com/shalinmangar/solr-perf-tools
Lucene benchmarks


> Cluster mode Stress test suite 
> ---
>
> Key: SOLR-13933
> URL: https://issues.apache.org/jira/browse/SOLR-13933
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Assignee: Ishan Chattopadhyaya
>Priority: Major
>
> We need a stress test harness based on 10s or 100s of nodes, 1000s of 
> collection API operations, overseer operations etc. This suite should run 
> nightly, publish results publicly, so as to help with:
> # Uncover stability problems
> # Benchmarking (timings, resource metrics etc.) on collection operations
> # Indexing/querying performance
> # Validate the accuracy of potential improvements
> References:
> SOLR-10317
> https://github.com/lucidworks/solr-scale-tk
> https://github.com/shalinmangar/solr-perf-tools
> Lucene benchmarks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Namgyu Kim (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974512#comment-16974512
 ] 

Namgyu Kim commented on LUCENE-8987:


Awesome work! [~janhoy]
 I found there are some simple mistakes :D

1) Resources links in [https://lucene.staged.apache.org/core/] is wrong. (right 
side of the page)
 [https://lucene.staged.apache.org/discussion.html] => 
[https://lucene.staged.apache.org/core/discussion.html]
 [https://lucene.staged.apache.org/developer.html] => 
[https://lucene.staged.apache.org/core/developer.html]
 [https://lucene.staged.apache.org/features.html] => 
[https://lucene.staged.apache.org/core/features.html]
 But [https://lucene.staged.apache.org/core/features.html] is not found.
 [https://lucene.staged.apache.org/downloads.html] => 
[https://lucene.staged.apache.org/core/downloads.html]

2) In mailing list, there is an unchanged content.
 As you know, our Slack page is #lucene-dev now.
 It was changed a week ago and I changed the web page an hour ago.
 [https://lucene.apache.org/core/discussion.html#slack]
 [https://lucene.apache.org/solr/community.html#slack]
 Channel name #lucene-solr -> #lucene-dev

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13101) Shared storage support in SolrCloud

2019-11-14 Thread Andy Vuong (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974515#comment-16974515
 ] 

Andy Vuong commented on SOLR-13101:
---

We still need to work on adding some documentation to the ref-guide on how to 
configure/use the feature from an end-user prospective. I suppose a doc would 
will be useful covering public interfaces, additions to ZK, and the overall 
design for solr developers as well.

 

> Shared storage support in SolrCloud
> ---
>
> Key: SOLR-13101
> URL: https://issues.apache.org/jira/browse/SOLR-13101
> Project: Solr
>  Issue Type: New Feature
>  Components: SolrCloud
>Reporter: Yonik Seeley
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> Solr should have first-class support for shared storage (blob/object stores 
> like S3, google cloud storage, etc. and shared filesystems like HDFS, NFS, 
> etc).
> The key component will likely be a new replica type for shared storage.  It 
> would have many of the benefits of the current "pull" replicas (not indexing 
> on all replicas, all shards identical with no shards getting out-of-sync, 
> etc), but would have additional benefits:
>  - Any shard could become leader (the blob store always has the index)
>  - Better elasticity scaling down
>- durability not linked to number of replcias.. a single replica could be 
> common for write workloads
>- could drop to 0 replicas for a shard when not needed (blob store always 
> has index)
>  - Allow for higher performance write workloads by skipping the transaction 
> log
>- don't pay for what you don't need
>- a commit will be necessary to flush to stable storage (blob store)
>  - A lot of the complexity and failure modes go away
> An additional component a Directory implementation that will work well with 
> blob stores.  We probably want one that treats local disk as a cache since 
> the latency to remote storage is so large.  I think there are still some 
> "locking" issues to be solved here (ensuring that more than one writer to the 
> same index won't corrupt it).  This should probably be pulled out into a 
> different JIRA issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8983) PhraseWildcardQuery - new query to control and optimize wildcard expansions in phrase

2019-11-14 Thread Ken LaPorte (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974514#comment-16974514
 ] 

Ken LaPorte commented on LUCENE-8983:
-

Hi [~bruno.roustant]. I don't yet. The team we're working with is reluctant to 
make modifications to the software at this point as they have released to their 
beta clients. At present, we've shifted to testing this internally in the hopes 
of making progress there. 

> PhraseWildcardQuery - new query to control and optimize wildcard expansions 
> in phrase
> -
>
> Key: LUCENE-8983
> URL: https://issues.apache.org/jira/browse/LUCENE-8983
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Reporter: Bruno Roustant
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> A generalized version of PhraseQuery, built with one or more MultiTermQuery 
> that provides term expansions for multi-terms (one of the expanded terms must 
> match).
> Its main advantage is to control the total number of expansions across all 
> MultiTermQuery and across all segments.
>  This query is similar to MultiPhraseQuery, but it handles, controls and 
> optimizes the multi-term expansions.
>  
>  This query is equivalent to building an ordered SpanNearQuery with a list of 
> SpanTermQuery and SpanMultiTermQueryWrapper.
>  But it optimizes the multi-term expansions and the segment accesses.
>  It first resolves the single-terms to early stop if some does not match. 
> Then it expands each multi-term sequentially, stopping immediately if one 
> does not match. It detects the segments that do not match to skip them for 
> the next expansions. This often avoid expanding the other multi-terms on some 
> or even all segments. And finally it controls the total number of expansions.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9046) Fix wrong example in Javadoc of TermInSetQuery

2019-11-14 Thread Namgyu Kim (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9046?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974518#comment-16974518
 ] 

Namgyu Kim commented on LUCENE-9046:


Thanks! [~jpountz]

> Fix wrong example in Javadoc of TermInSetQuery
> --
>
> Key: LUCENE-9046
> URL: https://issues.apache.org/jira/browse/LUCENE-9046
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Namgyu Kim
>Assignee: Namgyu Kim
>Priority: Minor
> Fix For: 8.x, master (9.0)
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> There is a wrong example in Javadoc of TermInSetQuery.
> This patch will be merged to the master and 8.x branch.
>  
> Before
> {code:java}
> Query q1 = new TermInSetQuery(new Term("field", "foo"), new Term("field", 
> "bar"));
> BooleanQuery bq = new BooleanQuery();
> bq.add(new TermQuery(new Term("field", "foo")), Occur.SHOULD);
> bq.add(new TermQuery(new Term("field", "bar")), Occur.SHOULD);
> Query q2 = new ConstantScoreQuery(bq);
> {code}
> After
> {code:java}
> Query q1 = new TermInSetQuery("field", new BytesRef("foo"), new 
> BytesRef("bar"));
> BooleanQuery bq = new BooleanQuery();
> bq.add(new TermQuery(new Term("field", "foo")), Occur.SHOULD);
> bq.add(new TermQuery(new Term("field", "bar")), Occur.SHOULD);
> Query q2 = new ConstantScoreQuery(bq);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13930) Running TestKoreanTokenizer with Ant fails in gradle_8 build

2019-11-14 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974516#comment-16974516
 ] 

Erick Erickson commented on SOLR-13930:
---

See my comment just above. It's the _ant_ build that fails on the GW branch.

> Running TestKoreanTokenizer with Ant fails  in gradle_8 build
> -
>
> Key: SOLR-13930
> URL: https://issues.apache.org/jira/browse/SOLR-13930
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
> Environment: This fails with:
> java.lang.RuntimeException: Cannot find userdict.txt in test classpath!
> userdict.txt gets copied when I test on the trunk branch to (at least I think 
> this is the corresponding one):
> ./lucene/build/analysis/nori/*classes*/test/org/apache/lucene/analysis/ko/userdict.txt
> So my presumption is that the ant build takes care of this and somehow the 
> classpath is set to include it.
> This is on a clean checkout of the current gradle_8 branch, _without_ trying 
> to do anything with Gradle.
>Reporter: Erick Erickson
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974517#comment-16974517
 ] 

ASF subversion and git services commented on LUCENE-9018:
-

Commit e466d622c8161038d4e0730e2925474a0a05d596 in lucene-solr's branch 
refs/heads/master from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e466d62 ]

LUCENE-9018: ConcatenateGraphFilter now has a configurable separator.


> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox

uschindler commented on issue #889: LUCENE-8983: Add PhraseWildcardQuery to 
control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554021084
 
 
   The current way how this is done (create the MultiTermQuery termsenum) is 
violating the API. The method MTQ#getTermsEnum is protected, so it should never 
ever called from the outside. Java just allows this from the same package, but 
it's just incorrect. protected methods should only be called from the class 
itsself and its subclasses.
   
   Expanding terms of a MTQ should be done by passing a RewriteMethod and then 
rewriting it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler edited a comment on issue #889: LUCENE-8983: Add PhraseWildcardQuery to control multi-terms expansions in a phrase

2019-11-14 Thread GitBox

uschindler edited a comment on issue #889: LUCENE-8983: Add PhraseWildcardQuery 
to control multi-terms expansions in a phrase
URL: https://github.com/apache/lucene-solr/pull/889#issuecomment-554021084
 
 
   The current way how this is done (create the MultiTermQuery termsenum) is 
violating the API. The method MTQ#getTermsEnum is protected, so it should never 
ever called from the outside. Java just allows this from the same package, but 
it's just incorrect. protected methods should only be called from the class 
itsself and its subclasses.
   
   Expanding terms of a MTQ should be done by passing a RewriteMethod and then 
rewriting it (this still looks like a hack, but it's correct way). Some MTQ 
queryies may do some adjustments on rewrite.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974528#comment-16974528
 ] 

ASF subversion and git services commented on LUCENE-9018:
-

Commit e5f2b2380b6e93d48df5f1733113c6b6c0bc090c in lucene-solr's branch 
refs/heads/branch_8x from David Smiley
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e5f2b23 ]

LUCENE-9018: ConcatenateGraphFilter now has a configurable separator.

(cherry picked from commit e466d622c8161038d4e0730e2925474a0a05d596)


> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9018) Separator for ConcatenateGraphFilterFactory

2019-11-14 Thread David Smiley (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley updated LUCENE-9018:
-
Fix Version/s: 8.4
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks for contributing!

Note: I changed the reference in the factory from Version.LATEST to 
Version.8_4_0 since that is the specific version introducing this toggle.  I 
know those specific versions are marked deprecated, which is confusing and 
perhaps dissuaded you.

> Separator for ConcatenateGraphFilterFactory
> ---
>
> Key: LUCENE-9018
> URL: https://issues.apache.org/jira/browse/LUCENE-9018
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/analysis
>Reporter: Stanislav Mikulchik
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.4
>
> Attachments: LUCENE-9018.patch, LUCENE-9018.patch, LUCENE-9018.patch
>
>
> I would like to have an option to choose a separator to use for token 
> concatenation. Currently ConcatenateGraphFilterFactory can use only "\u001F" 
> symbol.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-13890) Add postfilter support to {!terms} queries

2019-11-14 Thread Jason Gerlowski (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974534#comment-16974534
 ] 

Jason Gerlowski commented on SOLR-13890:


Thanks Mikhail, I'll update my pass at the docs soon!

> Add postfilter support to {!terms} queries
> --
>
> Key: SOLR-13890
> URL: https://issues.apache.org/jira/browse/SOLR-13890
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Major
> Attachments: SOLR-13890.patch, SOLR-13890.patch
>
>
> There are some use-cases where it'd be nice if the "terms" qparser created a 
> query that could be run as a postfilter.  Particularly, when users are 
> checking for hundreds or thousands of terms, a postfilter implementation can 
> be more performant than the standard processing.
> WIth this issue, I'd like to propose a post-filter implementation for the 
> {{docValuesTermsFilter}} "method".  Postfilter creation can use a 
> SortedSetDocValues object to populate a DV bitset with the "terms" being 
> checked for.  Each document run through the post-filter can look at their 
> doc-values for the field in question and check them efficiently against the 
> constructed bitset.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-8987) Move Lucene web site from svn to git

2019-11-14 Thread Adam Walz (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-8987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974536#comment-16974536
 ] 

Adam Walz commented on LUCENE-8987:
---

Thanks [~danmuzi], there are some known issues. I still need to go through each 
page with a fine-toothed comb to ensure parity with production. This process 
will be easier now that the site is on staging rather than building locally 
only. I'll go through these mistakes this weekend. 

 

I've been trying to port changes in from the svn site, but haven't ported 
anything in the last week which is why the slack channel is unchanged. I'll fix 
that.

> Move Lucene web site from svn to git
> 
>
> Key: LUCENE-8987
> URL: https://issues.apache.org/jira/browse/LUCENE-8987
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/website
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
> Attachments: lucene-site-repo.png
>
>
> INFRA just enabled [a new way of configuring website 
> build|https://s.apache.org/asfyaml] from a git branch, [see dev list 
> email|https://lists.apache.org/thread.html/b6f7e40bece5e83e27072ecc634a7815980c90240bc0a2ccb417f1fd@%3Cdev.lucene.apache.org%3E].
>  It allows for automatic builds of both staging and production site, much 
> like the old CMS. We can choose to auto publish the html content of an 
> {{output/}} folder, or to have a bot build the site using 
> [Pelican|https://github.com/getpelican/pelican] from a {{content/}} folder.
> The goal of this issue is to explore how this can be done for 
> [http://lucene.apache.org|http://lucene.apache.org/] by, by creating a new 
> git repo {{lucene-site}}, copy over the site from svn, see if it can be 
> "Pelicanized" easily and then test staging. Benefits are that more people 
> will be able to edit the web site and we can take PRs from the public (with 
> GitHub preview of pages).
> Non-goals:
>  * Create a new web site or a new graphic design
>  * Change from Markdown to Asciidoc



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9036) ExitableDirectoryReader to interrupt DocValues as well

2019-11-14 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated LUCENE-9036:
-
Attachment: LUCENE-9036.patch
Status: Patch Available  (was: Patch Available)

> ExitableDirectoryReader to interrupt DocValues as well
> --
>
> Key: LUCENE-9036
> URL: https://issues.apache.org/jira/browse/LUCENE-9036
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Mikhail Khludnev
>Priority: Major
> Attachments: LUCENE-9036.patch, LUCENE-9036.patch, LUCENE-9036.patch, 
> LUCENE-9036.patch
>
>
> This allow to make AnalyticsComponent and json.facet sensitive to time 
> allowed. 
> Does it make sense? Is it enough to check on DV creation ie per field/segment 
> or it's worth to check every Nth doc? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Adam Walz (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974541#comment-16974541
 ] 

Adam Walz commented on LUCENE-9015:
---

[~janhoy] For the python script to merge commits from asf-staging to asf-site I 
would suggest trying to fit it into the tasks.py or publishconf.py which is 
provided by Pelican.

 

Would you like to work on that or would you like me to this weekend?

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9048) Tutorial and docs section missing from the new website

2019-11-14 Thread Adam Walz (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9048?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974543#comment-16974543
 ] 

Adam Walz commented on LUCENE-9048:
---

[~janhoy] Yes I will fix this section over the weekend. Thanks for the review

> Tutorial and docs section missing from the new website
> --
>
> Key: LUCENE-9048
> URL: https://issues.apache.org/jira/browse/LUCENE-9048
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: general/website
>Reporter: Jan Høydahl
>Priority: Major
>
> See [https://lucene.staged.apache.org/solr/resources.html#tutorials]
> The Tutorials and Docuemtation sub sections are missing from this page



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9015) Configure branches, auto build and auto stage/publish

2019-11-14 Thread Jira



[ 
https://issues.apache.org/jira/browse/LUCENE-9015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16974564#comment-16974564
 ] 

Jan Høydahl commented on LUCENE-9015:
-

We have to ask infra about recommended workflow. Perhaps they have a “publish” 
button on the buildbot server? Or can make some asf.json support for it? 
Imagine if we could commit a git hash to the publish part of .asf.yaml that 
would then build and publish that exact version to prod?

> Configure branches, auto build and auto stage/publish
> -
>
> Key: LUCENE-9015
> URL: https://issues.apache.org/jira/browse/LUCENE-9015
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Commit to master should build and publish the staging site
> Find a simple way to trigger publishing of main site from staging



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

1 2 >

1 - 100 of 143 matches

Mail list logo