[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097858#comment-17097858
 ] 

Dawid Weiss commented on LUCENE-9321:
-

bq. Instead of gathering all javedocs into one place and checking relative 
links, could we fix the linting script to make it work with per-project folder 
? In the script, I think we also can forbid someone to add relative links 
(which strengthens interdependencies between sub-projects) any more. 

I agree with Tomoko here. An additional bonus of not having cross-project 
relative links is that javadocs displayed by IDEs work properly. The top-level 
index is a different matter because it is for site needs only (and then you can 
link relative javadocs for each package).

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] gunasekhardora commented on a change in pull request #1371: SOLR-14333: print readable version of CollapsedPostFilter query

2020-05-02 Thread GitBox


gunasekhardora commented on a change in pull request #1371:
URL: https://github.com/apache/lucene-solr/pull/1371#discussion_r418926468



##
File path: 
solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java
##
@@ -128,6 +128,28 @@ field collapsing (with ngroups) when the number of 
distinct groups
   public static final String HINT_TOP_FC = "top_fc";
   public static final String HINT_MULTI_DOCVALUES = "multi_docvalues";
 
+  public enum NullPolicy {
+IGNORE("ignore", 0),

Review comment:
   @madrob Removed them. Added an unit test to validate if null policy is 
an illegal argument as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9278) Make javadoc folder structure follow Gradle project path

2020-05-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097862#comment-17097862
 ] 

ASF subversion and git services commented on LUCENE-9278:
-

Commit 951efc95be338cab3f693c45a50a9e36a237743e in lucene-solr's branch 
refs/heads/master from Uwe Schindler
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=951efc9 ]

LUCENE-9278: Improved options file creation: All parameters are escaped 
automatically, arguments don't need to be strings (they are converted during 
building options file) (#1479)



> Make javadoc folder structure follow Gradle project path
> 
>
> Key: LUCENE-9278
> URL: https://issues.apache.org/jira/browse/LUCENE-9278
> Project: Lucene - Core
>  Issue Type: Task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 7h 10m
>  Remaining Estimate: 0h
>
> Current javadoc folder structure is derived from Ant project name. e.g.:
> [https://lucene.apache.org/core/8_4_1/analyzers-icu/index.html]
>  [https://lucene.apache.org/solr/8_4_1/solr-solrj/index.html]
> For Gradle build, it should also follow gradle project structure (path) 
> instead of ant one, to keep things simple to manage [1]. Hence, it will look 
> like this:
> [https://lucene.apache.org/core/9_0_0/analysis/icu/index.html]
>  [https://lucene.apache.org/solr/9_0_0/solr/solrj/index.html]
> [1] The change was suggested at the conversation between Dawid Weiss and I on 
> a github pr: [https://github.com/apache/lucene-solr/pull/1304]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14455) Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers

2020-05-02 Thread Sujith (Jira)
Sujith created SOLR-14455:
-

 Summary: Autoscaling policy for ADDREPLICA not functioning in 
Metric Based Triggers
 Key: SOLR-14455
 URL: https://issues.apache.org/jira/browse/SOLR-14455
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
Reporter: Sujith


The Autoscaling policy for ADDREPLICA is not functioning in Metric Based 
Triggers. The "preferredOperation" was given "*ADDREPLICA*" for a sample metric 
trigger and it wasnt functioning. However on the other hand, the operation 
MOVEREPLICA is working as expected. I tried this in Solr version 7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


mocobeta commented on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622866628


   Just for clarification... Eventually, the `build/documentation` folder 
should look like this regardless of which way we choose - keep per-project 
javadoc outputs, or outputs docs into top-level `documentation` folder from the 
beginning:
   
   **Lucene**
   ```
   lucene/build/documentation
   ├── JRE_VERSION_MIGRATION.html
   ├── MIGRATE.html
   ├── SYSTEM_REQUIREMENTS.html
   ├── index.html
   ├── lucene_green_300.gif
   ├── analysis
   │   ├── common
   │   ├── icu
   │   ├── kuromoji
   │   ├── morfologik
   │   ├── nori
   │   ├── opennlp
   │   ├── phonetic
   │   ├── smartcn
   │   └── stempel
   ├── backward-codecs
   ├── benchmark
   ├── classification
   ├── codecs
   ├── core
   ├── demo
   ├── expressions
   ├── facet
   ├── grouping
   ├── highlighter
   ├── join
   ├── luke
   ├── memory
   ├── misc
   ├── monitor
   ├── queries
   ├── queryparser
   ├── replicator
   ├── sandbox
   ├── spatial-extras
   ├── spatial3d
   ├── suggest
   └── test-framework
   ```
   
   **Solr**
   ```
   solr/build/documentation
   ├── SYSTEM_REQUIREMENTS.html
   ├── index.html
   ├── contrib
   │   ├── analysis-extras
   │   ├── analytics
   │   ├── clustering
   │   ├── dataimporthandler
   │   ├── dataimporthandler-extras
   │   ├── extraction
   │   ├── jaegertracer-configurator
   │   ├── langid
   │   ├── ltr
   │   ├── prometheus-exporter
   │   └── velocity
   ├── core
   ├── solrj
   └── test-framework
   ```
   
   The each subproject's javadoc folder structure is consistent with Gradle 
project path (as I emphasized on LUCENE-9278). Both `build/documentation` 
folder should be uploaded to lucene.apache.org website on an as-is basis (it's 
the final purpose of the `documentation` task).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta edited a comment on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


mocobeta edited a comment on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622866628


   Just for clarification... Eventually, the `build/documentation` folder 
should look like this regardless of which way we choose - keep per-project 
javadoc outputs, or outputs docs into top-level `documentation` folder from the 
beginning:
   
   **Lucene**
   ```
   lucene/build/documentation
   ├── JRE_VERSION_MIGRATION.html
   ├── MIGRATE.html
   ├── SYSTEM_REQUIREMENTS.html
   ├── index.html
   ├── lucene_green_300.gif
   ├── changes
   ├── analysis
   │   ├── common
   │   ├── icu
   │   ├── kuromoji
   │   ├── morfologik
   │   ├── nori
   │   ├── opennlp
   │   ├── phonetic
   │   ├── smartcn
   │   └── stempel
   ├── backward-codecs
   ├── benchmark
   ├── classification
   ├── codecs
   ├── core
   ├── demo
   ├── expressions
   ├── facet
   ├── grouping
   ├── highlighter
   ├── join
   ├── luke
   ├── memory
   ├── misc
   ├── monitor
   ├── queries
   ├── queryparser
   ├── replicator
   ├── sandbox
   ├── spatial-extras
   ├── spatial3d
   ├── suggest
   └── test-framework
   ```
   
   **Solr**
   ```
   solr/build/documentation
   ├── SYSTEM_REQUIREMENTS.html
   ├── index.html
   ├── changes
   ├── images
   ├── contrib
   │   ├── analysis-extras
   │   ├── analytics
   │   ├── clustering
   │   ├── dataimporthandler
   │   ├── dataimporthandler-extras
   │   ├── extraction
   │   ├── jaegertracer-configurator
   │   ├── langid
   │   ├── ltr
   │   ├── prometheus-exporter
   │   └── velocity
   ├── core
   ├── solrj
   └── test-framework
   ```
   
   The each subproject's javadoc folder structure is consistent with Gradle 
project path (as I emphasized on LUCENE-9278). Both `build/documentation` 
folder should be uploaded to lucene.apache.org website on an as-is basis (it's 
the final purpose of the `documentation` task).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


dweiss commented on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622879655


   Thanks for clarifying, Tomoko. I'm much in favor of keeping the javadocs in 
per-project build folders but if Uwe insists this is a problem then would it be 
a large patch to build those docs under target documentation location?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mocobeta commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


mocobeta commented on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622903717


   > if Uwe insists this is a problem then would it be a large patch to build 
those docs under target documentation location?
   
   I think all we need is replacing `project.javadoc.destinationDir` variables 
in this file with the target documentation folder, 
`_docroot_/${pathToDocdir(project.path)}`. (For now the renderJavadoc task does 
not care about where is the final destination, so we have to teach it about the 
value of _docroot_ to the task in some way.)
   
https://github.com/apache/lucene-solr/blob/master/gradle/render-javadoc.gradle



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14456) Compressed requests fail in SolrCloud when the request is routed internally by the serving solr node

2020-05-02 Thread Jira
Samuel García Martínez created SOLR-14456:
-

 Summary: Compressed requests fail in SolrCloud when the request is 
routed internally by the serving solr node
 Key: SOLR-14456
 URL: https://issues.apache.org/jira/browse/SOLR-14456
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 7.7.2
 Environment: Solr version: 7.7.2

Solr cloud enabled

Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
HTTP LB using Round Robin over all nodes

All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME 
types.

Solr client: HttpSolrClient targeting the HTTP LB
h3.  
Reporter: Samuel García Martínez


h3. Solr cluster setup
 * Solr version: 7.7.2
 * Solr cloud enabled
 * Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
HTTP LB using Round Robin over all nodes
 * All cluster nodes have gzip enabled for all paths, all HTTP verbs and all 
MIME types.
 * Solr client: HttpSolrClient targeting the HTTP LB

h3. Problem description

When the Solr node that receives the request has to forward it
to a Solr Node that can actually perform the query, the response headers are 
added incorrectly to the response, causing any HTTP client to fail, whether 
it's a SolrClient or a basic HTTP client implementation with any other SDK.

To simplify the case, let's try to start from the following repro scenario:
 * Start one node with cloud mode and port 8983
 * Create one single collection (1 shard, 1 replica)
 * Start another node with port 8984 and the previusly started zk (-z 
localhost:9983)
 * Start a java application and query the cluster using the node on port 8984 
(the one that doesn't host the collection)

So, then something like this happens:
 * The application queries node:8984 with compression enabled 
("Accept-Encoding: gzip")
and wt=javabin
 * Node:8984 can't perform the query and creates a http request behind the 
scenes to node:8983
 * Node:8983 returns a gzipped response with "Content-Encoding: gzip" and 
"Content-Type:
application/octet-stream"
Node:8984 adds the "Content-Encoding: gzip" header as character stream to the 
response
(it should be forwarded as "Content-Encoding" header, not character encoding)
 * HttpSolrClient receives a "Content-Type: 
application/octet-stream;charset=gzip", causing
an exception.
 * HttpSolrClient tries to quietly close the connection, but since the stream 
is broken,
the Utils.consumeFully fails to actually consume the entity (it throws another 
exception in
GzipDecompressingEntity#getContent() with "not in GZIP format")

The exception thrown by HttpSolrClient is:
{code:java}
java.nio.charset.UnsupportedCharsetException: gzip
 at java.nio.charset.Charset.forName(Charset.java:531)
 at org.apache.http.entity.ContentType.create(ContentType.java:271)
 at org.apache.http.entity.ContentType.create(ContentType.java:261)
 at org.apache.http.entity.ContentType.parse(ContentType.java:319)
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591)
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
 at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
 at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015)
 at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031)
 at 
org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke()
 at 
org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218){code}
 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?

2020-05-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097883#comment-17097883
 ] 

ASF subversion and git services commented on LUCENE-9087:
-

Commit 96c47bc8508142b5bd11d2cdc492df380801efec in lucene-solr's branch 
refs/heads/master from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=96c47bc ]

LUCENE-9087: Build always trees with full leaves and lower the default value 
for maxPointsPerLeafNode to 512



> Should the BKD tree use a fixed maxPointsInLeafNode? 
> -
>
> Key: LUCENE-9087
> URL: https://issues.apache.org/jira/browse/LUCENE-9087
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Study of BKD tree performance with different values for 
> max points per leaf.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
> constructor. For the current default codec the value is set to 1024. This is 
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in 
> memory usage. Now that the BKD tree can be load off-heap, this can be less of 
> a concern. Note that lowering too much that value can hurt performance as 
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes 
> datatypes) the benefits are larger as it can minimise the overlap between 
> leaf nodes. 
> Finally, creating too many leaf nodes can be dangerous at write time as 
> memory usage depends on the number of leaf nodes created. The writer creates 
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to 
> create the most efficient tree? My current ideas are:
>  
>  * We can adapt the points per leaf depending on that number so we create a 
> tree with the best depth and best points per leaf. Note that for the for 1D 
> case we have an upper estimation of the number of points that the tree will 
> be indexing. 
>  * Add a mechanism so field types can easily define their best points per 
> leaf. In the case, field types like ranges or shapes can define its own value 
> to minimise overlap.
>  * Maybe the default is just too high now that we can load the tree off-heap.
> Any thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


uschindler commented on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622927763


   Hi, thanks for the discussion. @mocobeta explains it correct: You just have 
to change one thing to change the destination dir in the render-javadoc.
   
   > I was not aware of that this task depends on projects' relative paths. To 
me, before proceeding it we need to manage to reach a consensus about the 
destination (output) directory for "renderJavadoc" anyway...?
   
   Actually this is my main concern. I will comment on this on the issue. I 
have an idea for that. The reason for the issue is because there is conflicting 
interests: javadocs-jar on Maven central vs. documentation folder on web site 
and inside tar.gz of whole Lucene bundle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?

2020-05-02 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097887#comment-17097887
 ] 

ASF subversion and git services commented on LUCENE-9087:
-

Commit 5a922c3c8523cd01fae4720a57132d12c20f1191 in lucene-solr's branch 
refs/heads/branch_8x from Ignacio Vera
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5a922c3 ]

LUCENE-9087: Build always trees with full leaves and lower the default value 
for maxPointsPerLeafNode to 512


> Should the BKD tree use a fixed maxPointsInLeafNode? 
> -
>
> Key: LUCENE-9087
> URL: https://issues.apache.org/jira/browse/LUCENE-9087
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Ignacio Vera
>Priority: Major
> Attachments: Study of BKD tree performance with different values for 
> max points per leaf.pdf
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the 
> constructor. For the current default codec the value is set to 1024. This is 
> a good compromise between memory usage and performance of the BKD tree.
> Lowering this value can increase search performance but it has a penalty in 
> memory usage. Now that the BKD tree can be load off-heap, this can be less of 
> a concern. Note that lowering too much that value can hurt performance as 
> well as the tree becomes too deep and benefits are gone.
> For data types that use the tree as an effective R-tree (ranges and shapes 
> datatypes) the benefits are larger as it can minimise the overlap between 
> leaf nodes. 
> Finally, creating too many leaf nodes can be dangerous at write time as 
> memory usage depends on the number of leaf nodes created. The writer creates 
> a long array of length = numberOfLeafNodes.
> What I am wondering here is if we can improve this situation in order to 
> create the most efficient tree? My current ideas are:
>  
>  * We can adapt the points per leaf depending on that number so we create a 
> tree with the best depth and best points per leaf. Note that for the for 1D 
> case we have an upper estimation of the number of points that the tree will 
> be indexing. 
>  * Add a mechanism so field types can easily define their best points per 
> leaf. In the case, field types like ranges or shapes can define its own value 
> to minimise overlap.
>  * Maybe the default is just too high now that we can load the tree off-heap.
> Any thoughts?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-9321:
--
Attachment: screenshot-1.png

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14453) Solr proximity search highlighting issue

2020-05-02 Thread amit naliyapara (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097895#comment-17097895
 ] 

amit naliyapara commented on SOLR-14453:


I tried unified method but not working.

> Solr proximity search highlighting issue
> 
>
> Key: SOLR-14453
> URL: https://issues.apache.org/jira/browse/SOLR-14453
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 8.4.1
>Reporter: amit naliyapara
>Priority: Major
> Attachments: Highlighted-response.PNG, Not-Highlighted-response.PNG, 
> managed-schema, solr-doc-Id-1.txt
>
>
> I found some problem in highlighting module. Not all the search terms are 
> getting highlighted.
> Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30&hl=true
> Indexed text: "pos1 pos2 pos3 pos4"
> You can see that only two terms are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Not-highlighted-response screen shot for same.
> The scenario is when term positions are in-order in document and query both.
> If term position not in-order then it work proper
> Sample query: q={!complexphrase+inOrder=false}"pos3 (pos1 OR pos2)"~30&hl=true
> You can see that all three term are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Highlighted-response screen shot for same.
> The scenario is same in Solr source code since long time (I have checked in 
> Solr version 4 to version 7).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] uschindler commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


uschindler commented on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622931376


   See my langthly comment here: 
https://issues.apache.org/jira/browse/LUCENE-9321?focusedCommentId=17097899&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17097899
   
   Please read it fully concentrated!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899
 ] 

Uwe Schindler commented on LUCENE-9321:
---

Hi,

as said on the PR, there is actually two (or better three) different ways how 
to consume the javadocs and each of those have different requirements with 
inter-module links:
 # One consumer is using the javadocs from the Maven JAR files (e.g., in the 
IDE)
 # The other consumers are using the website where the javadocs are all at one 
place. They not only do this for releases, but also for snapshot builds by 
Jenkins (see below, this makes the current "absolute" links not working they 
way how it is setup).
 # Somebody downloads the tar.gz file of Lucene and wants to browse the 
javadocs from there. Actually I do this all the time when I am validating a 
release (because that's the only way to do this, as the javadocs are not yet 
deployed on central web page).

The consumer #1 is perfectly fine with current setup. For us it's easy to 
package. The only things that is currenty borken is the way how the absolute 
links are generated: They are hardcoded!!! This cannot be like that. We have 
nightly snapshot builds on Jenkins where we producce snapshots where all 
Javadocs go into nowhere. In the ANT build this is handled by making the 
"Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding 
{{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property 
on the ANT invocation. By that all links which are absolute are correct. A 
release manager can also set this, but there's currently automatism in ANT: If 
the version does not end in "-SNAPSHOT", the links are generated using the 
absolute links using the version number. We have version.properties for that. 
This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for 
Gradle (just define "base URL path" with 2 properties):

!screenshot-1.png|width=753,height=316!

This allows to browse the full documentation here: 
[https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] 
(including valid absolute links also cross-project to Lucene). All Snapshot 
artifacts deployed on snapshots.apache.org (including ZIP files) have those 
links inside. This makes it easy for the user to browse and also somebody using 
the artifacts in his IDE (think about Elasticsearch or any other projects using 
snapshot artifacts from ASF). They are perfectly fine, it's now also better 
than before!

No comes user #3: He downloads the targz/zip file and wants to browser Javadocs 
or the development member who votes for a release. He wants to show the 
javadocs. Unfortunately he can't as all links are dead (the Javadocs are not 
yet published). Also somebody who downloaded the tar.gz file wants to dive 
through the documentation with *relative* links. With just copying or 
symlinking all Javadocs to some central folder, this isn't satisfied.

User #2 is somehow inbetween, but I tend to make him identical to user #3. I 
don't like it to publish HTML pages on lucene.apache.org with absolute links to 
lucene.apache.org. We recently changed to HTTPS, so for similar cases all links 
in historic Javadocs would need to be rewritten. Thanks to redirects it still 
works, but there can be man-in-the-middle problems. I wanted to download the 
whole SVN repository in the near future and let run a {{sed}} through it to fix 
all old links. This is major work. If links are all relative, you don't have 
that problem.

bq. Other linting tasks in ant's "documentation-lint", ecjLint and 
checkMissingDocs work fine with per-project javadoc folder.

They work, because documentation-lint does not check everything. The linter 
does not follow absolute links, so it can't verify. It just passes. It's OK to 
check that all links in the module are correct, but it can't check the full 
documentation. So before a release "documentation-lint" must also be ran on the 
tol level. This is a requirement for the release. But for this to work, the 
links must be relative.

*Now comes my proposal:* 

- I tend to leave the per-project javadocs as is, they should be used to build 
maven artifacts. This makes IDE users happy and I hope also Dawid. The only 
thing is to allow to configure the lucene and solr specific "base" url for 
absolute links. This allows to make snapshot artifacts on Jenkins correctly. 
Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" 
or not.
- For the website and .tar.gz release (so packaging) the release manager should 
run the whole javadocs a second time (we should *not* copy them). For this 
second run for packaging purposes, we change the Javadocs output directory to 
the top-level one (as proposed by Tomoko). In addition the absolute links 
should be relative. This can easily be done using java.net.URI class. Just 
build the absolut

[GitHub] [lucene-solr] uschindler edited a comment on pull request #1477: LUCENE-9321: Port markdown task to Gradle

2020-05-02 Thread GitBox


uschindler edited a comment on pull request #1477:
URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622931376


   See my lengthly comment here: 
https://issues.apache.org/jira/browse/LUCENE-9321?focusedCommentId=17097899&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17097899
   
   Please read it fully concentrated!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received

2020-05-02 Thread Jira
Samuel García Martínez created SOLR-14457:
-

 Summary: SolrClient leaks a connection forever when an 
unexpected/malformed Entity is received
 Key: SOLR-14457
 URL: https://issues.apache.org/jira/browse/SOLR-14457
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrJ
Affects Versions: 7.7.2
 Environment: Solr version: 7.7.2

Solr cloud enabled

Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
HTTP LB using

Round Robin over all nodes

All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME 
types.

Solr client: HttpSolrClient targeting the HTTP LB
Reporter: Samuel García Martínez


When the SolrJ receives a malformed response Entity, for example like the one 
described in SOLR-14456, the client leaks the connection forever as it's never 
released back to the pool.

If Solr (for whatever reason) or any intermediate networking piece (firewall, 
proxy, load balancer) messes up the response, SolrJ tries to release the 
connection but GzipDecompressingEntity#getContent fails with an 
IOException("Not in GZIP format"), making it impossible to release the 
connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received

2020-05-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097902#comment-17097902
 ] 

Samuel García Martínez commented on SOLR-14457:
---

Scenarios that corrupt the response like SOLR-14456 breaks the connection 
management

> SolrClient leaks a connection forever when an unexpected/malformed Entity is 
> received
> -
>
> Key: SOLR-14457
> URL: https://issues.apache.org/jira/browse/SOLR-14457
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2
> Environment: Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
> HTTP LB using
> Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all 
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>Reporter: Samuel García Martínez
>Priority: Major
>
> When the SolrJ receives a malformed response Entity, for example like the one 
> described in SOLR-14456, the client leaks the connection forever as it's 
> never released back to the pool.
> If Solr (for whatever reason) or any intermediate networking piece (firewall, 
> proxy, load balancer) messes up the response, SolrJ tries to release the 
> connection but GzipDecompressingEntity#getContent fails with an 
> IOException("Not in GZIP format"), making it impossible to release the 
> connection.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899
 ] 

Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:30 AM:
-

Hi,

as said on the PR, there is actually two (or better three) different ways how 
to consume the javadocs and each of those have different requirements with 
inter-module links:
 # One consumer is using the javadocs from the Maven JAR files (e.g., in the 
IDE)
 # The other consumers are using the website where the javadocs are all at one 
place. They not only do this for releases, but also for snapshot builds by 
Jenkins (see below, this makes the current "absolute" links not working they 
way how it is setup).
 # Somebody downloads the tar.gz file of Lucene and wants to browse the 
javadocs from there. Actually I do this all the time when I am validating a 
release (because that's the only way to do this, as the javadocs are not yet 
deployed on central web page).

The consumer #1 is perfectly fine with current setup. For us it's easy to 
package. The only things that is currenty borken is the way how the absolute 
links are generated: They are hardcoded!!! This cannot be like that. We have 
nightly snapshot builds on Jenkins where we producce snapshots where all 
Javadocs go into nowhere. In the ANT build this is handled by making the 
"Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding 
{{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property 
on the ANT invocation. By that all links which are absolute are correct. A 
release manager can also set this, but there's currently automatism in ANT: If 
the version does not end in "-SNAPSHOT", the links are generated using the 
absolute links using the version number. We have version.properties for that. 
This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for 
Gradle (just define "base URL path" with 2 properties):

!screenshot-1.png|width=753,height=316!

This allows to browse the full documentation here: 
[https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] 
(including valid absolute links also cross-project to Lucene). All Snapshot 
artifacts deployed on snapshots.apache.org (including ZIP files) have those 
links inside. This makes it easy for the user to browse and also somebody using 
the artifacts in his IDE (think about Elasticsearch or any other projects using 
snapshot artifacts from ASF). They are perfectly fine, it's now also better 
than before!

No comes user #3: He downloads the targz/zip file and wants to browser Javadocs 
or the development member who votes for a release. He wants to show the 
javadocs. Unfortunately he can't as all links are dead (the Javadocs are not 
yet published). Also somebody who downloaded the tar.gz file wants to dive 
through the documentation with *relative* links. With just copying or 
symlinking all Javadocs to some central folder, this isn't satisfied.

User #2 is somehow inbetween, but I tend to make him identical to user #3. I 
don't like it to publish HTML pages on lucene.apache.org with absolute links to 
lucene.apache.org. We recently changed to HTTPS, so for similar cases all links 
in historic Javadocs would need to be rewritten. Thanks to redirects it still 
works, but there can be man-in-the-middle problems. I wanted to download the 
whole SVN repository in the near future and let run a {{sed}} through it to fix 
all old links. This is major work. If links are all relative, you don't have 
that problem.

bq. Other linting tasks in ant's "documentation-lint", ecjLint and 
checkMissingDocs work fine with per-project javadoc folder.

They work, because documentation-lint does not check everything. The linter 
does not follow absolute links, so it can't verify. It just passes. It's OK to 
check that all links in the module are correct, but it can't check the full 
documentation. So before a release "documentation-lint" must also be ran on the 
tol level. This is a requirement for the release. But for this to work, the 
links must be relative.

*Now comes my proposal:* 

- I tend to leave the per-project javadocs as is, they should be used to build 
maven artifacts. This makes IDE users happy and I hope also Dawid. The only 
thing is to allow to configure the lucene and solr specific "base" url for 
absolute links. This allows to make snapshot artifacts on Jenkins correctly. 
Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" 
or not.
- For the website and .tar.gz release (so packaging) the release manager should 
run the whole javadocs a second time (we should *not* copy them). For this 
second run for packaging purposes, we change the Javadocs output directory to 
the top-level one (as proposed by Tomoko). In addition the absolute links 
should be relative. This can easily be don

[jira] [Updated] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received

2020-05-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel García Martínez updated SOLR-14457:
--
Description: 
When the SolrJ receives a malformed response Entity, for example like the one 
described in SOLR-14456, the client leaks the connection forever as it's never 
released back to the pool.

If Solr (for whatever reason) or any intermediate networking piece (firewall, 
proxy, load balancer) messes up the response, SolrJ tries to release the 
connection but GzipDecompressingEntity#getContent fails with an 
IOException("Not in GZIP format"), making it impossible to release the 
connection.

On top of the bug itself, not being able to set a timeout while waiting for a 
connection to be available, makes any application unresponsive as it will run 
out of threads eventually.

  was:
When the SolrJ receives a malformed response Entity, for example like the one 
described in SOLR-14456, the client leaks the connection forever as it's never 
released back to the pool.

If Solr (for whatever reason) or any intermediate networking piece (firewall, 
proxy, load balancer) messes up the response, SolrJ tries to release the 
connection but GzipDecompressingEntity#getContent fails with an 
IOException("Not in GZIP format"), making it impossible to release the 
connection.


> SolrClient leaks a connection forever when an unexpected/malformed Entity is 
> received
> -
>
> Key: SOLR-14457
> URL: https://issues.apache.org/jira/browse/SOLR-14457
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2
> Environment: Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
> HTTP LB using
> Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all 
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>Reporter: Samuel García Martínez
>Priority: Major
>
> When the SolrJ receives a malformed response Entity, for example like the one 
> described in SOLR-14456, the client leaks the connection forever as it's 
> never released back to the pool.
> If Solr (for whatever reason) or any intermediate networking piece (firewall, 
> proxy, load balancer) messes up the response, SolrJ tries to release the 
> connection but GzipDecompressingEntity#getContent fails with an 
> IOException("Not in GZIP format"), making it impossible to release the 
> connection.
> On top of the bug itself, not being able to set a timeout while waiting for a 
> connection to be available, makes any application unresponsive as it will run 
> out of threads eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899
 ] 

Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:31 AM:
-

Hi,

as said on the PR, there is actually two (or better three) different ways how 
to consume the javadocs and each of those have different requirements with 
inter-module links:
 # One consumer is using the javadocs from the Maven JAR files (e.g., in the 
IDE)
 # The other consumers are using the website where the javadocs are all at one 
place. They not only do this for releases, but also for snapshot builds by 
Jenkins (see below, this makes the current "absolute" links not working they 
way how it is setup).
 # Somebody downloads the tar.gz file of Lucene and wants to browse the 
javadocs from there. Actually I do this all the time when I am validating a 
release (because that's the only way to do this, as the javadocs are not yet 
deployed on central web page).

The consumer #1 is perfectly fine with current setup. For us it's easy to 
package. The only things that is currenty borken is the way how the absolute 
links are generated: They are hardcoded!!! This cannot be like that. We have 
nightly snapshot builds on Jenkins where we producce snapshots where all 
Javadocs go into nowhere. In the ANT build this is handled by making the 
"Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding 
{{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property 
on the ANT invocation. By that all links which are absolute are correct. A 
release manager can also set this, but there's currently automatism in ANT: If 
the version does not end in "-SNAPSHOT", the links are generated using the 
absolute links using the version number. We have version.properties for that. 
This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for 
Gradle (just define "base URL path" with 2 properties):

!screenshot-1.png|width=753,height=316!

This allows to browse the full documentation here: 
[https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] 
(including valid absolute links also cross-project to Lucene). All Snapshot 
artifacts deployed on snapshots.apache.org (including ZIP files) have those 
links inside. This makes it easy for the user to browse and also somebody using 
the artifacts in his IDE (think about Elasticsearch or any other projects using 
snapshot artifacts from ASF). They are perfectly fine, it's now also better 
than before!

No comes user #3: He downloads the targz/zip file and wants to browser Javadocs 
or the development member who votes for a release. He wants to show the 
javadocs. Unfortunately he can't as all links are dead (the Javadocs are not 
yet published). Also somebody who downloaded the tar.gz file wants to dive 
through the documentation with *relative* links. With just copying or 
symlinking all Javadocs to some central folder, this isn't satisfied.

User #2 is somehow inbetween, but I tend to make him identical to user #3. I 
don't like it to publish HTML pages on lucene.apache.org with absolute links to 
lucene.apache.org. We recently changed to HTTPS, so for similar cases all links 
in historic Javadocs would need to be rewritten. Thanks to redirects it still 
works, but there can be man-in-the-middle problems. I wanted to download the 
whole SVN repository in the near future and let run a {{sed}} through it to fix 
all old links. This is major work. If links are all relative, you don't have 
that problem.

bq. Other linting tasks in ant's "documentation-lint", ecjLint and 
checkMissingDocs work fine with per-project javadoc folder.

They work, because documentation-lint does not check everything. The linter 
does not follow absolute links, so it can't verify. It just passes. It's OK to 
check that all links in the module are correct, but it can't check the full 
documentation. So before a release "documentation-lint" must also be ran on the 
top level. This is a requirement for the release. But for this to work, the 
links must be relative.

*Now comes my proposal:* 

- I tend to leave the per-project javadocs as is, they should be used to build 
maven artifacts. This makes IDE users happy and I hope also Dawid. The only 
thing is to allow to configure the lucene and solr specific "base" url for 
absolute links. This allows to make snapshot artifacts on Jenkins correctly. 
Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" 
or not.
- For the website and .tar.gz release (so packaging) the release manager should 
run the whole javadocs a second time (we should *not* copy them). For this 
second run for packaging purposes, we change the Javadocs output directory to 
the top-level one (as proposed by Tomoko). In addition the absolute links 
should be relative. This can easily be don

[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899
 ] 

Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:31 AM:
-

Hi,

as said on the PR, there is actually two (or better three) different ways how 
to consume the javadocs and each of those have different requirements with 
inter-module links:
 # One consumer is using the javadocs from the Maven JAR files (e.g., in the 
IDE)
 # The other consumers are using the website where the javadocs are all at one 
place. They not only do this for releases, but also for snapshot builds by 
Jenkins (see below, this makes the current "absolute" links not working they 
way how it is setup).
 # Somebody downloads the tar.gz file of Lucene and wants to browse the 
javadocs from there. Actually I do this all the time when I am validating a 
release (because that's the only way to do this, as the javadocs are not yet 
deployed on central web page).

The consumer #1 is perfectly fine with current setup. For us it's easy to 
package. The only things that is currenty borken is the way how the absolute 
links are generated: They are hardcoded!!! This cannot be like that. We have 
nightly snapshot builds on Jenkins where we producce snapshots where all 
Javadocs go into nowhere. In the ANT build this is handled by making the 
"Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding 
{{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property 
on the ANT invocation. By that all links which are absolute are correct. A 
release manager can also set this, but there's currently automatism in ANT: If 
the version does not end in "-SNAPSHOT", the links are generated using the 
absolute links using the version number. We have version.properties for that. 
This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for 
Gradle (just define "base URL path" with 2 properties):

!screenshot-1.png|width=753,height=316!

This allows to browse the full documentation here: 
[https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] 
(including valid absolute links also cross-project to Lucene). All Snapshot 
artifacts deployed on snapshots.apache.org (including ZIP files) have those 
links inside. This makes it easy for the user to browse and also somebody using 
the artifacts in his IDE (think about Elasticsearch or any other projects using 
snapshot artifacts from ASF). They are perfectly fine, it's now also better 
than before!

No comes user #3: He downloads the targz/zip file and wants to browser Javadocs 
or the development member who votes for a release. He wants to show the 
javadocs. Unfortunately he can't as all links are dead (the Javadocs are not 
yet published). Also somebody who downloaded the tar.gz file wants to dive 
through the documentation with *relative* links. With just copying or 
symlinking all Javadocs to some central folder, this isn't satisfied.

User #2 is somehow inbetween, but I tend to make him identical to user #3. I 
don't like it to publish HTML pages on lucene.apache.org with absolute links to 
lucene.apache.org. We recently changed to HTTPS, so for similar cases all links 
in historic Javadocs would need to be rewritten. Thanks to redirects it still 
works, but there can be man-in-the-middle problems. I wanted to download the 
whole SVN repository in the near future and let run a {{sed}} through it to fix 
all old links. This is major work. If links are all relative, you don't have 
that problem.

bq. Other linting tasks in ant's "documentation-lint", ecjLint and 
checkMissingDocs work fine with per-project javadoc folder.

They work, because documentation-lint does not check everything. The linter 
does not follow absolute links, so it can't verify. It just passes. It's OK to 
check that all links in the module are correct, but it can't check the full 
documentation. So before a release "documentation-lint" must also be ran on the 
tol level. This is a requirement for the release. But for this to work, the 
links must be relative.

*Now comes my proposal:* 

- I tend to leave the per-project javadocs as is, they should be used to build 
maven artifacts. This makes IDE users happy and I hope also Dawid. The only 
thing is to allow to configure the lucene and solr specific "base" url for 
absolute links. This allows to make snapshot artifacts on Jenkins correctly. 
Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" 
or not.
- For the website and .tar.gz release (so packaging) the release manager should 
run the whole javadocs a second time (we should *not* copy them). For this 
second run for packaging purposes, we change the Javadocs output directory to 
the top-level one (as proposed by Tomoko). In addition the absolute links 
should be relative. This can easily be don

[jira] [Updated] (SOLR-14457) SolrClient leaks connections on compressed responses if the response is malformed

2020-05-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel García Martínez updated SOLR-14457:
--
Summary: SolrClient leaks connections on compressed responses if the 
response is malformed  (was: SolrClient leaks a connection forever when an 
unexpected/malformed Entity is received)

> SolrClient leaks connections on compressed responses if the response is 
> malformed
> -
>
> Key: SOLR-14457
> URL: https://issues.apache.org/jira/browse/SOLR-14457
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2
> Environment: Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
> HTTP LB using
> Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all 
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>Reporter: Samuel García Martínez
>Priority: Major
>
> When the SolrJ receives a malformed response Entity, for example like the one 
> described in SOLR-14456, the client leaks the connection forever as it's 
> never released back to the pool.
> If Solr (for whatever reason) or any intermediate networking piece (firewall, 
> proxy, load balancer) messes up the response, SolrJ tries to release the 
> connection but GzipDecompressingEntity#getContent fails with an 
> IOException("Not in GZIP format"), making it impossible to release the 
> connection.
> On top of the bug itself, not being able to set a timeout while waiting for a 
> connection to be available, makes any application unresponsive as it will run 
> out of threads eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Uwe Schindler (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899
 ] 

Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:33 AM:
-

Hi,

as said on the PR, there is actually two (or better three) different ways how 
to consume the javadocs and each of those have different requirements with 
inter-module links:
 # One consumer is using the javadocs from the Maven JAR files (e.g., in the 
IDE)
 # The other consumers are using the website where the javadocs are all at one 
place. They not only do this for releases, but also for snapshot builds by 
Jenkins (see below, this makes the current "absolute" links not working they 
way how it is setup).
 # Somebody downloads the tar.gz file of Lucene and wants to browse the 
javadocs from there. Actually I do this all the time when I am validating a 
release (because that's the only way to do this, as the javadocs are not yet 
deployed on central web page).

The consumer #1 is perfectly fine with current setup. For us it's easy to 
package. The only things that is currenty borken is the way how the absolute 
links are generated: They are hardcoded!!! This cannot be like that. We have 
nightly snapshot builds on Jenkins where we producce snapshots where all 
Javadocs go into nowhere. In the ANT build this is handled by making the 
"Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding 
{{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property 
on the ANT invocation. By that all links which are absolute are correct. A 
release manager can also set this, but there's currently automatism in ANT: If 
the version does not end in "-SNAPSHOT", the links are generated using the 
absolute links using the version number. We have version.properties for that. 
This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for 
Gradle (just define "base URL path" with 2 properties):

!screenshot-1.png|width=753,height=316!

This allows to browse the full documentation here: 
[https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] 
(including valid absolute links also cross-project to Lucene). All Snapshot 
artifacts deployed on snapshots.apache.org (including ZIP files) have those 
links inside. This makes it easy for the user to browse and also somebody using 
the artifacts in his IDE (think about Elasticsearch or any other projects using 
snapshot artifacts from ASF). They are perfectly fine, it's now also better 
than before!

No comes user #3: He downloads the targz/zip file and wants to browser Javadocs 
or the development member who votes for a release. He wants to show the 
javadocs. Unfortunately he can't as all links are dead (the Javadocs are not 
yet published). Also somebody who downloaded the tar.gz file wants to dive 
through the documentation with *relative* links. With just copying or 
symlinking all Javadocs to some central folder, this isn't satisfied.

User #2 is somehow inbetween, but I tend to make him identical to user #3. I 
don't like it to publish HTML pages on lucene.apache.org with absolute links to 
lucene.apache.org. We recently changed to HTTPS, so for similar cases all links 
in historic Javadocs would need to be rewritten. Thanks to redirects it still 
works, but there can be man-in-the-middle problems. I wanted to download the 
whole SVN repository in the near future and let run a {{sed}} through it to fix 
all old links. This is major work. If links are all relative, you don't have 
that problem.

bq. Other linting tasks in ant's "documentation-lint", ecjLint and 
checkMissingDocs work fine with per-project javadoc folder.

They work, because documentation-lint does not check everything. The linter 
does not follow absolute links, so it can't verify. It just passes. It's OK to 
check that all links in the module are correct, but it can't check the full 
documentation. So before a release "documentation-lint" must also be ran on the 
top level. This is a requirement for the release. But for this to work, the 
links must be relative.

*Now comes my proposal:* 

- I tend to leave the per-project javadocs as is, they should be used to build 
maven artifacts. This makes IDE users happy and I hope also Dawid. The only 
thing is to allow to configure the lucene and solr specific "base" url for 
absolute links. This allows to make snapshot artifacts on Jenkins correctly. 
Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" 
or not.
- For the website and .tar.gz release (so packaging) the release manager should 
run the whole javadocs a second time (we should *not* copy them). For this 
second run for packaging purposes, we change the Javadocs output directory to 
the top-level one (as proposed by Tomoko). In addition the absolute links 
should be relative. This can easily be don

[jira] [Updated] (SOLR-14457) SolrClient leaks connections on compressed responses if the response is malformed

2020-05-02 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Samuel García Martínez updated SOLR-14457:
--
Description: 
h3. Summary

When the SolrJ receives a malformed response Entity, for example like the one 
described in SOLR-14456, the client leaks the connection forever as it's never 
released back to the pool.
h3. Problem description

HttpSolrClient should have compression enabled, so it uses the compression 
interceptors.

When the request is marked with "Content-Encoding: gzip" but for whatever 
reason the response is not in GZIP format, when  HttpSolrClient tries to close 
the connection using Utils.consumeFully(), it tries to create the 
GzipInputStream failing and throwing an Exception. The exception thrown makes 
it impossible to access the underlying InputStream to be closed, therefore the 
connection is leaked.

Despite that the content in the response should honour the headers specified 
for it, SolrJ should be reliable enough to prevent the connection leak when 
this scenario happens. On top of the bug itself, not being able to set a 
timeout while waiting for a connection to be available, makes any application 
unresponsive as it will run out of threads eventually.

  was:
When the SolrJ receives a malformed response Entity, for example like the one 
described in SOLR-14456, the client leaks the connection forever as it's never 
released back to the pool.

If Solr (for whatever reason) or any intermediate networking piece (firewall, 
proxy, load balancer) messes up the response, SolrJ tries to release the 
connection but GzipDecompressingEntity#getContent fails with an 
IOException("Not in GZIP format"), making it impossible to release the 
connection.

On top of the bug itself, not being able to set a timeout while waiting for a 
connection to be available, makes any application unresponsive as it will run 
out of threads eventually.


> SolrClient leaks connections on compressed responses if the response is 
> malformed
> -
>
> Key: SOLR-14457
> URL: https://issues.apache.org/jira/browse/SOLR-14457
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: SolrJ
>Affects Versions: 7.7.2
> Environment: Solr version: 7.7.2
> Solr cloud enabled
> Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 
> HTTP LB using
> Round Robin over all nodes
> All cluster nodes have gzip enabled for all paths, all HTTP verbs and all 
> MIME types.
> Solr client: HttpSolrClient targeting the HTTP LB
>Reporter: Samuel García Martínez
>Priority: Major
>
> h3. Summary
> When the SolrJ receives a malformed response Entity, for example like the one 
> described in SOLR-14456, the client leaks the connection forever as it's 
> never released back to the pool.
> h3. Problem description
> HttpSolrClient should have compression enabled, so it uses the compression 
> interceptors.
> When the request is marked with "Content-Encoding: gzip" but for whatever 
> reason the response is not in GZIP format, when  HttpSolrClient tries to 
> close the connection using Utils.consumeFully(), it tries to create the 
> GzipInputStream failing and throwing an Exception. The exception thrown makes 
> it impossible to access the underlying InputStream to be closed, therefore 
> the connection is leaked.
> Despite that the content in the response should honour the headers specified 
> for it, SolrJ should be reliable enough to prevent the connection leak when 
> this scenario happens. On top of the bug itself, not being able to set a 
> timeout while waiting for a connection to be available, makes any application 
> unresponsive as it will run out of threads eventually.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1467: LUCENE-9350: Don't hold references to large automata on FuzzyQuery

2020-05-02 Thread GitBox


romseygeek commented on a change in pull request #1467:
URL: https://github.com/apache/lucene-solr/pull/1467#discussion_r418947150



##
File path: lucene/core/src/java/org/apache/lucene/search/FuzzyQuery.java
##
@@ -183,7 +162,7 @@ public void visit(QueryVisitor visitor) {
   if (maxEdits == 0 || prefixLength >= term.text().length()) {
 visitor.consumeTerms(this, term);
   } else {
-automata[automata.length - 1].visit(visitor, this, field);
+visitor.consumeTermsMatching(this, term.field(), () -> 
getAutomata().runAutomaton);

Review comment:
   Only if the visitor implementation actually needs it, we're passing a 
`Supplier` now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14455) Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers

2020-05-02 Thread Erick Erickson (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Erick Erickson resolved SOLR-14455.
---
Resolution: Incomplete

Please raise questions like this on the user's list. There's not enough 
information here to hope to reproduce, and a discussion on the user's list 
should clarify exactly what your setup is and whether it's a bug or a a 
misunderstanding on your part.

See: 
http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links 
to both Lucene and Solr mailing lists there.

A _lot_ more people will see your question on that list and may be able to help 
more quickly.

You might want to review: 
https://wiki.apache.org/solr/UsingMailingLists

If it's determined that this really is a code issue or enhancement to Lucene or 
Solr and not a configuration/usage problem, we can raise a new JIRA or reopen 
this one.



> Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers
> --
>
> Key: SOLR-14455
> URL: https://issues.apache.org/jira/browse/SOLR-14455
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Sujith
>Priority: Major
>
> The Autoscaling policy for ADDREPLICA is not functioning in Metric Based 
> Triggers. The "preferredOperation" was given "*ADDREPLICA*" for a sample 
> metric trigger and it wasnt functioning. However on the other hand, the 
> operation MOVEREPLICA is working as expected. I tried this in Solr version 7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9355) missing releases from testbackwardscompatibility

2020-05-02 Thread Adrien Grand (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097940#comment-17097940
 ] 

Adrien Grand commented on LUCENE-9355:
--

[~noble.paul] The smoke tester jobs have been failing since April 27th, see 
e.g. this build failure: 
https://builds.apache.org/view/L/view/Lucene/job/Lucene-Solr-SmokeRelease-8.x/420/console.
 The procedure is documented at 
https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo#ReleaseTodo-GenerateBackcompatIndexes.
 I see that the release hasn't been announced yet either as Jan pointed out on 
the dev list, it looks like you haven't completed all release steps?

> missing releases from testbackwardscompatibility
> 
>
> Key: LUCENE-9355
> URL: https://issues.apache.org/jira/browse/LUCENE-9355
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Mike Drob
>Priority: Major
>
> I'm not sure what needs to be added for the 7.7.3 release, but can you take a 
> look at it [~noble] or figure out who to ask for help?
> {noformat}
>[smoker]   confirm all releases have coverage in TestBackwardsCompatibility
>[smoker] find all past Lucene releases...
>[smoker] run TestBackwardsCompatibility..
>[smoker] Releases that don't seem to be tested:
>[smoker]   7.7.3
>[smoker] Traceback (most recent call last):
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1487, in 
>[smoker] main()
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1413, in main
>[smoker] downloadOnly=c.download_only)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1465, in smokeTest
>[smoker] unpackAndVerify(java, 'lucene', tmpDir, 'lucene-%s-src.tgz' % 
> version, gitRevision, version, testArgs, baseURL)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 566, in unpackAndVerify
>[smoker] verifyUnpacked(java, project, artifact, unpackPath, 
> gitRevision, version, testArgs, tmpDir, baseURL)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 752, in verifyUnpacked
>[smoker] confirmAllReleasesAreTestedForBackCompat(version, unpackPath)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1388, in confirmAllReleasesAreTestedForBackCompat
>[smoker] raise RuntimeError('some releases are not tested by 
> TestBackwardsCompatibility?')
>[smoker] RuntimeError: some releases are not tested by 
> TestBackwardsCompatibility?
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9355) missing releases from testbackwardscompatibility

2020-05-02 Thread Jira


[ 
https://issues.apache.org/jira/browse/LUCENE-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097961#comment-17097961
 ] 

Jan Høydahl commented on LUCENE-9355:
-

All I can say is that the RM job requires careful attention to each and every 
step. I recommend the releaseWizard since it helps keep a checklist of what is 
completed and what remains. One of the steps should be to add back compact to 
later releases.

> missing releases from testbackwardscompatibility
> 
>
> Key: LUCENE-9355
> URL: https://issues.apache.org/jira/browse/LUCENE-9355
> Project: Lucene - Core
>  Issue Type: Test
>Reporter: Mike Drob
>Priority: Major
>
> I'm not sure what needs to be added for the 7.7.3 release, but can you take a 
> look at it [~noble] or figure out who to ask for help?
> {noformat}
>[smoker]   confirm all releases have coverage in TestBackwardsCompatibility
>[smoker] find all past Lucene releases...
>[smoker] run TestBackwardsCompatibility..
>[smoker] Releases that don't seem to be tested:
>[smoker]   7.7.3
>[smoker] Traceback (most recent call last):
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1487, in 
>[smoker] main()
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1413, in main
>[smoker] downloadOnly=c.download_only)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1465, in smokeTest
>[smoker] unpackAndVerify(java, 'lucene', tmpDir, 'lucene-%s-src.tgz' % 
> version, gitRevision, version, testArgs, baseURL)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 566, in unpackAndVerify
>[smoker] verifyUnpacked(java, project, artifact, unpackPath, 
> gitRevision, version, testArgs, tmpDir, baseURL)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 752, in verifyUnpacked
>[smoker] confirmAllReleasesAreTestedForBackCompat(version, unpackPath)
>[smoker]   File 
> "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py",
>  line 1388, in confirmAllReleasesAreTestedForBackCompat
>[smoker] raise RuntimeError('some releases are not tested by 
> TestBackwardsCompatibility?')
>[smoker] RuntimeError: some releases are not tested by 
> TestBackwardsCompatibility?
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9356) Add tests for corruptions caused by byte flips

2020-05-02 Thread Adrien Grand (Jira)
Adrien Grand created LUCENE-9356:


 Summary: Add tests for corruptions caused by byte flips
 Key: LUCENE-9356
 URL: https://issues.apache.org/jira/browse/LUCENE-9356
 Project: Lucene - Core
  Issue Type: Test
Reporter: Adrien Grand


We already have tests that file truncation and modification of the index 
headers are caught correctly. I'd like to add another test that flipping a byte 
in a way that modifies the checksum of the file is always caught gracefully by 
Lucene.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin

2020-05-02 Thread GitBox


janhoy commented on pull request #341:
URL: https://github.com/apache/lucene-solr/pull/341#issuecomment-622957422


   I believe it is up to date with master now.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] janhoy commented on a change in pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin

2020-05-02 Thread GitBox


janhoy commented on a change in pull request #341:
URL: https://github.com/apache/lucene-solr/pull/341#discussion_r418962126



##
File path: 
solr/core/src/java/org/apache/solr/security/RuleBasedAuthorizationPluginBase.java
##
@@ -0,0 +1,270 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.solr.security;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.security.Principal;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.function.Function;
+
+import org.apache.solr.common.SpecProvider;
+import org.apache.solr.common.util.CommandOperation;
+import org.apache.solr.common.util.Utils;
+import org.apache.solr.common.util.ValidatingJsonMap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Collections.unmodifiableMap;
+import static java.util.function.Function.identity;
+import static java.util.stream.Collectors.toMap;
+import static org.apache.solr.handler.admin.SecurityConfHandler.getListValue;
+
+/**
+ * Base class for rule based authorization plugins
+ */
+public abstract class RuleBasedAuthorizationPluginBase implements 
AuthorizationPlugin, ConfigEditablePlugin, SpecProvider {

Review comment:
   We could of course have kept one RBAC class, made the user-group mapping 
optional and always checked for roles on Principal, but I like the subclass 
approach better.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] jpountz commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-02 Thread GitBox


jpountz commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r418963368



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsReader.java
##
@@ -148,56 +155,80 @@ public BlockTreeTermsReader(PostingsReaderBase 
postingsReader, SegmentReadState
   CodecUtil.retrieveChecksum(termsIn);
 
   // Read per-field details
-  seekDir(termsIn);
-  seekDir(indexIn);
+  String metaName = IndexFileNames.segmentFileName(segment, 
state.segmentSuffix, TERMS_META_EXTENSION);
+  Map fieldMap = null;
+  Throwable priorE = null;
+  try (ChecksumIndexInput metaIn = version >= VERSION_META_FILE ? 
state.directory.openChecksumInput(metaName, state.context) : null) {
+try {
+  final IndexInput indexMetaIn, termsMetaIn;
+  if (version >= VERSION_META_FILE) {
+CodecUtil.checkIndexHeader(metaIn, TERMS_META_CODEC_NAME, version, 
version, state.segmentInfo.getId(), state.segmentSuffix);
+indexMetaIn = termsMetaIn = metaIn;
+  } else {
+seekDir(termsIn);
+seekDir(indexIn);
+indexMetaIn = indexIn;
+termsMetaIn = termsIn;
+  }
 
-  final int numFields = termsIn.readVInt();
-  if (numFields < 0) {
-throw new CorruptIndexException("invalid numFields: " + numFields, 
termsIn);
-  }
-  fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1);
-  for (int i = 0; i < numFields; ++i) {
-final int field = termsIn.readVInt();
-final long numTerms = termsIn.readVLong();
-if (numTerms <= 0) {
-  throw new CorruptIndexException("Illegal numTerms for field number: 
" + field, termsIn);
-}
-final BytesRef rootCode = readBytesRef(termsIn);
-final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field);
-if (fieldInfo == null) {
-  throw new CorruptIndexException("invalid field number: " + field, 
termsIn);
-}
-final long sumTotalTermFreq = termsIn.readVLong();
-// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only 
one value is written.
-final long sumDocFreq = fieldInfo.getIndexOptions() == 
IndexOptions.DOCS ? sumTotalTermFreq : termsIn.readVLong();
-final int docCount = termsIn.readVInt();
-if (version < VERSION_META_LONGS_REMOVED) {
-  final int longsSize = termsIn.readVInt();
-  if (longsSize < 0) {
-throw new CorruptIndexException("invalid longsSize for field: " + 
fieldInfo.name + ", longsSize=" + longsSize, termsIn);
+  final int numFields = termsMetaIn.readVInt();
+  if (numFields < 0) {
+throw new CorruptIndexException("invalid numFields: " + numFields, 
termsMetaIn);
+  }
+  fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1);
+  for (int i = 0; i < numFields; ++i) {
+final int field = termsMetaIn.readVInt();
+final long numTerms = termsMetaIn.readVLong();
+if (numTerms <= 0) {
+  throw new CorruptIndexException("Illegal numTerms for field 
number: " + field, termsMetaIn);
+}
+final BytesRef rootCode = readBytesRef(termsMetaIn);
+final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field);
+if (fieldInfo == null) {
+  throw new CorruptIndexException("invalid field number: " + 
field, termsMetaIn);
+}
+final long sumTotalTermFreq = termsMetaIn.readVLong();
+// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and 
only one value is written.
+final long sumDocFreq = fieldInfo.getIndexOptions() == 
IndexOptions.DOCS ? sumTotalTermFreq : termsMetaIn.readVLong();
+final int docCount = termsMetaIn.readVInt();
+if (version < VERSION_META_LONGS_REMOVED) {
+  final int longsSize = termsMetaIn.readVInt();
+  if (longsSize < 0) {
+throw new CorruptIndexException("invalid longsSize for field: 
" + fieldInfo.name + ", longsSize=" + longsSize, termsMetaIn);
+  }
+}
+BytesRef minTerm = readBytesRef(termsMetaIn);
+BytesRef maxTerm = readBytesRef(termsMetaIn);
+if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // 
#docs with field must be <= #docs
+  throw new CorruptIndexException("invalid docCount: " + docCount 
+ " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn);

Review comment:
   Not directly, and these things are hard to test, though I agree we could 
do better. I opened https://issues.apache.org/jira/browse/LUCENE-9356 to try to 
improve the coverage of these code paths.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub

[jira] [Resolved] (SOLR-14453) Solr proximity search highlighting issue

2020-05-02 Thread David Smiley (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Smiley resolved SOLR-14453.
-
Resolution: Won't Fix

I think this is probably a bug or limitation in the underlying SpanQuery and 
not the highlighters.  There are some limitations of SpanQuery based queries in 
which it won't necessarily report all matches in the returned "spans".  I don't 
think SpanQueries are going to be fixed (sorry) because it's a rather 
fundamental problem with their internal design.  Instead, the Lucene project 
recently created a new class of queries to semi-replace SpanQuery: 
{{IntervalQuery}}  -- _tah-dah_!  I did a quick hack of 
{{org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals#testMatchesSlopBug}}
 to tweak it to look like your bug report here and it highlighted them as you 
want.  Unfortunately, there are no query parsers in Lucene or Solr that produce 
them yet.  Perhaps ComplexPhraseQueryParser should be modified to use 
IntervalQuery instead of SpanQuery.

CC [~romseygeek]

> Solr proximity search highlighting issue
> 
>
> Key: SOLR-14453
> URL: https://issues.apache.org/jira/browse/SOLR-14453
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: highlighter
>Affects Versions: 8.4.1
>Reporter: amit naliyapara
>Priority: Major
> Attachments: Highlighted-response.PNG, Not-Highlighted-response.PNG, 
> managed-schema, solr-doc-Id-1.txt
>
>
> I found some problem in highlighting module. Not all the search terms are 
> getting highlighted.
> Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30&hl=true
> Indexed text: "pos1 pos2 pos3 pos4"
> You can see that only two terms are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Not-highlighted-response screen shot for same.
> The scenario is when term positions are in-order in document and query both.
> If term position not in-order then it work proper
> Sample query: q={!complexphrase+inOrder=false}"pos3 (pos1 OR pos2)"~30&hl=true
> You can see that all three term are highlighted like, "pos1 
> pos2 pos3 pos4"
> Please find attached Highlighted-response screen shot for same.
> The scenario is same in Solr source code since long time (I have checked in 
> Solr version 4 to version 7).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098065#comment-17098065
 ] 

Dawid Weiss commented on LUCENE-9321:
-

bq. For the website and .tar.gz release (so packaging) the release manager 
should run the whole javadocs a second time (we should not copy them).

I wouldn't require a second pass. If it's something required for the "release" 
then let's have a releast task in gradle and take care of it there. Otherwise 
the "release" scripts are duplicating what could as well be done within the 
main build script?

Also, I'm sorry if this is a stupid question but can we just *not* have any 
cross-module links at all? How many of these cross-module links are we talking 
about? Maybe we can just dump them altogether?


> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Dawid Weiss (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098065#comment-17098065
 ] 

Dawid Weiss edited comment on LUCENE-9321 at 5/2/20, 7:07 PM:
--

bq. For the website and .tar.gz release (so packaging) the release manager 
should run the whole javadocs a second time (we should not copy them).

I wouldn't require a second (independent) build pass. If it's something 
required for the "release" then let's have a release task in gradle and take 
care of it there (javadocs built twice but within the same run of the build - 
the "release" build). Otherwise the "release" scripts are duplicating what 
could as well be done within the main build script?

Also, I'm sorry if this is a stupid question but can we just *not* have any 
cross-module links at all? How many of these cross-module links are we talking 
about? Maybe we can just dump them altogether?



was (Author: dweiss):
bq. For the website and .tar.gz release (so packaging) the release manager 
should run the whole javadocs a second time (we should not copy them).

I wouldn't require a second pass. If it's something required for the "release" 
then let's have a releast task in gradle and take care of it there. Otherwise 
the "release" scripts are duplicating what could as well be done within the 
main build script?

Also, I'm sorry if this is a stupid question but can we just *not* have any 
cross-module links at all? How many of these cross-module links are we talking 
about? Maybe we can just dump them altogether?


> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-02 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010061



##
File path: 
lucene/grouping/src/java/org/apache/lucene/search/grouping/DocValuesPoolingReader.java
##
@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search.grouping;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.lucene.index.BinaryDocValues;
+import org.apache.lucene.index.DocValues;
+import org.apache.lucene.index.FilterLeafReader;
+import org.apache.lucene.index.LeafReader;
+import org.apache.lucene.index.NumericDocValues;
+import org.apache.lucene.index.SortedDocValues;
+import org.apache.lucene.index.SortedNumericDocValues;
+import org.apache.lucene.index.SortedSetDocValues;
+import org.apache.lucene.index.TermsEnum;
+import org.apache.lucene.search.DocIdSetIterator;
+import org.apache.lucene.util.BytesRef;
+
+/**
+ * Caches docValues for the given {@linkplain LeafReader}.
+ * It only necessary when consumer retrieves same docValues many times per
+ * segment. Returned docValues should be iterated forward only.
+ * Caveat: {@link #getContext()} is completely misguiding for this class since 
+ * it looses baseDoc, ord from underneath context.
+ * @lucene.experimental   
+ * */
+class DocValuesPoolingReader extends FilterLeafReader {
+
+  @FunctionalInterface
+  interface DVSupplier{
+T getDocValues(String field) throws IOException;
+  } 
+  
+  private Map cache = new HashMap<>();
+
+  DocValuesPoolingReader(LeafReader in) {
+super(in);
+  }
+
+  @SuppressWarnings("unchecked")
+  protected  T computeIfAbsent(String field, 
DVSupplier supplier) throws IOException {
+T dv;
+if ((dv = (T) cache.get(field)) == null) {
+ dv = supplier.getDocValues(field);
+ cache.put(field, dv);
+}
+return dv;
+  }
+
+  @Override
+  public CacheHelper getReaderCacheHelper() {
+return null;
+  }
+
+  @Override
+  public CacheHelper getCoreCacheHelper() {
+return null;
+  }
+
+  @Override
+  public BinaryDocValues getBinaryDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getBinaryDocValues);
+  }
+  
+  @Override
+  public NumericDocValues getNumericDocValues(String field) throws IOException 
{
+return computeIfAbsent(field, in::getNumericDocValues);
+  }
+  
+  @Override
+  public SortedNumericDocValues getSortedNumericDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, in::getSortedNumericDocValues);
+  }
+  
+  public SortedDocValues getSortedDocValues(String field) throws IOException {
+return computeIfAbsent(field, in::getSortedDocValues);
+  }
+  
+  @Override
+  public SortedSetDocValues getSortedSetDocValues(String field) throws 
IOException {
+return computeIfAbsent(field, field1 -> {
+  final SortedSetDocValues sortedSet = in.getSortedSetDocValues(field1);
+  final SortedDocValues singleton = DocValues.unwrapSingleton(sortedSet);

Review comment:
   `SingletonWrapper` is too strict, relaxing it in own copy. Need to be 
done for `NumericsSetDV` as  
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-02 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010144



##
File path: 
lucene/grouping/src/test/org/apache/lucene/search/grouping/AllGroupHeadsCollectorTest.java
##
@@ -153,23 +187,149 @@ public void testBasic() throws Exception {
 assertTrue(openBitSetContains(new int[]{1, 5}, 
allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc));
 
 // STRING sort type triggers different implementation
-Sort sortWithinGroup2 = new Sort(new SortField("id_2", 
SortField.Type.STRING, true));
-allGroupHeadsCollector = createRandomCollector(groupField, 
sortWithinGroup2);
+for (Function sortFunc : new Function[] {
+   // (r) -> new SortField("id_2", SortField.Type.STRING, (boolean) r),
+   // (r) -> new SortedSetSortField("id_3", (boolean) r), 
+(r) -> new SortedSetSortField("id_4", (boolean) r)
+}) {
+
+  Sort sortWithinGroup2 = new Sort(sortFunc.apply(true));
+  allGroupHeadsCollector = createRandomCollector(groupField, 
sortWithinGroup2);
+  indexSearcher.search(new TermQuery(new Term("content", "random")), 
allGroupHeadsCollector);
+  assertTrue(arrayContains(new int[] {2, 3, 5, 7}, 
allGroupHeadsCollector.retrieveGroupHeads()));
+  assertTrue(openBitSetContains(new int[] {2, 3, 5, 7}, 
allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc));
+
+  Sort sortWithinGroup3 = new Sort(sortFunc.apply(false));
+  allGroupHeadsCollector = createRandomCollector(groupField, 
sortWithinGroup3);
+  indexSearcher.search(new TermQuery(new Term("content", "random")), 
allGroupHeadsCollector);
+  // 7 b/c higher doc id wins, even if order of field is in not in reverse.
+  assertTrue(arrayContains(new int[] {0, 3, 4, 6}, 
allGroupHeadsCollector.retrieveGroupHeads()));
+  assertTrue(openBitSetContains(new int[] {0, 3, 4, 6}, 
allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc));
+}
+indexSearcher.getIndexReader().close();
+dir.close();
+  }
+
+  public void testBasicBlockJoin() throws Exception {
+final String groupField = "author";
+Directory dir = newDirectory();
+RandomIndexWriter w = new RandomIndexWriter(
+random(),
+dir,
+newIndexWriterConfig(new 
MockAnalyzer(random())).setMergePolicy(newLogMergePolicy()));
+DocValuesType valueType = DocValuesType.SORTED;
+
+// 0
+Document doc = new Document();
+addGroupField(doc, groupField, "author1", valueType);
+doc.add(newTextField("content", "random text", Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 1));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("1")));
+addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("10")),
+new SortedSetDocValuesField("id_3", new BytesRef("11")));
+
+// 1
+doc = new Document();
+addGroupField(doc, groupField, "author1", valueType);
+doc.add(newTextField("content", "some more random text blob", 
Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 2));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("2")));
+addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("20")),
+  new SortedSetDocValuesField("id_3", new BytesRef("21")));
+
+// 2
+doc = new Document();
+addGroupField(doc, groupField, "author1", valueType);
+doc.add(newTextField("content", "some more random textual data", 
Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 3));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("3")));
+addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("30")),
+  new SortedSetDocValuesField("id_3", new BytesRef("31")));
+w.commit(); // To ensure a second segment
+
+// 3
+doc = new Document();
+addGroupField(doc, groupField, "author2", valueType);
+doc.add(newTextField("content", "some random text", Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 4));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("4")));
+addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("40")),
+ new SortedSetDocValuesField("id_3", new BytesRef("41")));
+
+// 4
+doc = new Document();
+addGroupField(doc, groupField, "author3", valueType);
+doc.add(newTextField("content", "some more random text", Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 5));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("5")));
+addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("50")),
+ new SortedSetDocValuesField("id_3", new BytesRef("51")));
+
+// 5
+doc = new Document();
+addGroupField(doc, groupField, "author3", valueType);
+doc.add(newTextField("content", "random blob", Field.Store.NO));
+doc.add(new NumericDocValuesField("id_1", 6));
+doc.add(new SortedDocValuesField("id_2", new BytesRef("6")))

[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-02 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010400



##
File path: 
lucene/join/src/java/org/apache/lucene/search/join/BlockJoinSelector.java
##
@@ -112,9 +112,9 @@ public static SortedDocValues wrap(final SortedDocValues 
values, Type selection,
*  one value per parent among its {@code children} using the configured
*  {@code selection} type. */
   public static SortedDocValues wrap(final SortedDocValues values, Type 
selection, BitSet parents, DocIdSetIterator children) {
-if (values.docID() != -1) {
-  throw new IllegalArgumentException("values iterator was already 
consumed: values.docID=" + values.docID());
-}
+//if (values.docID() != -1) {

Review comment:
   Would be discussed later, if all other issues resolved. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.

2020-05-02 Thread GitBox


mkhludnev commented on a change in pull request #1462:
URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010654



##
File path: 
lucene/test-framework/src/java/org/apache/lucene/codecs/asserting/AssertingDocValuesFormat.java
##
@@ -285,7 +286,9 @@ public SortedSetDocValues getSortedSet(FieldInfo field) 
throws IOException {
   assert field.getDocValuesType() == DocValuesType.SORTED_SET;
   SortedSetDocValues values = in.getSortedSet(field);
   assert values != null;
-  return new AssertingLeafReader.AssertingSortedSetDocValues(values, 
maxDoc);
+  final SortedDocValues singleton = DocValues.unwrapSingleton(values);
+  return singleton==null ? new 
AssertingLeafReader.AssertingSortedSetDocValues(values, maxDoc) :

Review comment:
   I think it's worth to span this to NumericsSet and also to other usages 
AssertingDV. Now, these DVs aren't handled by `DocValues.unwrapSingleton()`.
   @romseygeek , isn't it worth to commit separately?  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] MarcusSorealheis commented on pull request #1471: Revised SOLR-14014 PR Against Master

2020-05-02 Thread GitBox


MarcusSorealheis commented on pull request #1471:
URL: https://github.com/apache/lucene-solr/pull/1471#issuecomment-623017503


   > It would be nice to make the Admin UI Disabled page a little bit prettier 
if possible, something more akin to the login page, but that might be out of 
scope of this PR. Feel free to defer that to a follow on if you think it's 
better to handle separately.
   
   Subsequent PR please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-02 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098112#comment-17098112
 ] 

Mikhail Khludnev commented on LUCENE-9328:
--

Ok. [~romseygeek], thanks for the clue. It turns to be more complex problem. I 
pushed some broken code to github PR  
https://github.com/apache/lucene-solr/pull/1462/. Currently I'm stuck at 
[ToParentDocValues.advanceExact()|https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java#L259]
 which turns it to strict {{advance()}}. I think it's can be changed to use 
{{advanceExact()}}. Does it make any sense?   

> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9328) SortingGroupHead to reuse DocValues

2020-05-02 Thread Mikhail Khludnev (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098112#comment-17098112
 ] 

Mikhail Khludnev edited comment on LUCENE-9328 at 5/2/20, 10:11 PM:


Ok. [~romseygeek], thanks for the clue. 
Sorry for noise. WIP.


was (Author: mkhludnev):
Ok. [~romseygeek], thanks for the clue. It turns to be more complex problem. I 
pushed some broken code to github PR  
https://github.com/apache/lucene-solr/pull/1462/. Currently I'm stuck at 
[ToParentDocValues.advanceExact()|https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java#L259]
 which turns it to strict {{advance()}}. I think it's can be changed to use 
{{advanceExact()}}. Does it make any sense?   

> SortingGroupHead to reuse DocValues
> ---
>
> Key: LUCENE-9328
> URL: https://issues.apache.org/jira/browse/LUCENE-9328
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: modules/grouping
>Reporter: Mikhail Khludnev
>Assignee: Mikhail Khludnev
>Priority: Minor
> Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> That's why 
> https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098173#comment-17098173
 ] 

Tomoko Uchida commented on LUCENE-9321:
---

bq. How many of these cross-module links are we talking about? Maybe we can 
just dump them altogether?

I think the "checkJavadocLinks.py" does the work. It collects all {{href}} 
attributes in given HTML (regardless of they are absolete or relative, or they 
are external links or cross-module links). Maybe we can dump all links by 
inserting 'print' and run the script.
[https://github.com/apache/lucene-solr/blob/master/dev-tools/scripts/checkJavadocLinks.py#L31]

> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.

2020-05-02 Thread GitBox


msokolov commented on a change in pull request #1473:
URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419039099



##
File path: 
lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsReader.java
##
@@ -148,56 +155,80 @@ public BlockTreeTermsReader(PostingsReaderBase 
postingsReader, SegmentReadState
   CodecUtil.retrieveChecksum(termsIn);
 
   // Read per-field details
-  seekDir(termsIn);
-  seekDir(indexIn);
+  String metaName = IndexFileNames.segmentFileName(segment, 
state.segmentSuffix, TERMS_META_EXTENSION);
+  Map fieldMap = null;
+  Throwable priorE = null;
+  try (ChecksumIndexInput metaIn = version >= VERSION_META_FILE ? 
state.directory.openChecksumInput(metaName, state.context) : null) {
+try {
+  final IndexInput indexMetaIn, termsMetaIn;
+  if (version >= VERSION_META_FILE) {
+CodecUtil.checkIndexHeader(metaIn, TERMS_META_CODEC_NAME, version, 
version, state.segmentInfo.getId(), state.segmentSuffix);
+indexMetaIn = termsMetaIn = metaIn;
+  } else {
+seekDir(termsIn);
+seekDir(indexIn);
+indexMetaIn = indexIn;
+termsMetaIn = termsIn;
+  }
 
-  final int numFields = termsIn.readVInt();
-  if (numFields < 0) {
-throw new CorruptIndexException("invalid numFields: " + numFields, 
termsIn);
-  }
-  fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1);
-  for (int i = 0; i < numFields; ++i) {
-final int field = termsIn.readVInt();
-final long numTerms = termsIn.readVLong();
-if (numTerms <= 0) {
-  throw new CorruptIndexException("Illegal numTerms for field number: 
" + field, termsIn);
-}
-final BytesRef rootCode = readBytesRef(termsIn);
-final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field);
-if (fieldInfo == null) {
-  throw new CorruptIndexException("invalid field number: " + field, 
termsIn);
-}
-final long sumTotalTermFreq = termsIn.readVLong();
-// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only 
one value is written.
-final long sumDocFreq = fieldInfo.getIndexOptions() == 
IndexOptions.DOCS ? sumTotalTermFreq : termsIn.readVLong();
-final int docCount = termsIn.readVInt();
-if (version < VERSION_META_LONGS_REMOVED) {
-  final int longsSize = termsIn.readVInt();
-  if (longsSize < 0) {
-throw new CorruptIndexException("invalid longsSize for field: " + 
fieldInfo.name + ", longsSize=" + longsSize, termsIn);
+  final int numFields = termsMetaIn.readVInt();
+  if (numFields < 0) {
+throw new CorruptIndexException("invalid numFields: " + numFields, 
termsMetaIn);
+  }
+  fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1);
+  for (int i = 0; i < numFields; ++i) {
+final int field = termsMetaIn.readVInt();
+final long numTerms = termsMetaIn.readVLong();
+if (numTerms <= 0) {
+  throw new CorruptIndexException("Illegal numTerms for field 
number: " + field, termsMetaIn);
+}
+final BytesRef rootCode = readBytesRef(termsMetaIn);
+final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field);
+if (fieldInfo == null) {
+  throw new CorruptIndexException("invalid field number: " + 
field, termsMetaIn);
+}
+final long sumTotalTermFreq = termsMetaIn.readVLong();
+// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and 
only one value is written.
+final long sumDocFreq = fieldInfo.getIndexOptions() == 
IndexOptions.DOCS ? sumTotalTermFreq : termsMetaIn.readVLong();
+final int docCount = termsMetaIn.readVInt();
+if (version < VERSION_META_LONGS_REMOVED) {
+  final int longsSize = termsMetaIn.readVInt();
+  if (longsSize < 0) {
+throw new CorruptIndexException("invalid longsSize for field: 
" + fieldInfo.name + ", longsSize=" + longsSize, termsMetaIn);
+  }
+}
+BytesRef minTerm = readBytesRef(termsMetaIn);
+BytesRef maxTerm = readBytesRef(termsMetaIn);
+if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // 
#docs with field must be <= #docs
+  throw new CorruptIndexException("invalid docCount: " + docCount 
+ " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn);

Review comment:
   Thanks, Adrien





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



--

[jira] [Commented] (SOLR-13289) Support for BlockMax WAND

2020-05-02 Thread Kranti Parisa (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098238#comment-17098238
 ] 

Kranti Parisa commented on SOLR-13289:
--

Tomas et all, thanks for taking this up. This will be super useful in terms of 
performance gains.

How does this work for:
- Grouping, Sorting 
- Custom scoring via value sources
- Payloads

Is TwoPhaseIterator or custom ranking in collapse/expand mode or in response 
writer the way? or have to implement a custom query overriding the createWeight 
using ScoreMode.TOP_SCORES?

> Support for BlockMax WAND
> -
>
> Key: SOLR-13289
> URL: https://issues.apache.org/jira/browse/SOLR-13289
> Project: Solr
>  Issue Type: New Feature
>Reporter: Ishan Chattopadhyaya
>Assignee: Tomas Eduardo Fernandez Lobbe
>Priority: Major
> Attachments: SOLR-13289.patch, SOLR-13289.patch
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to 
> expose this via Solr. When enabled, the numFound returned will not be exact.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14351) Harden MDCLoggingContext.clear depth tracking

2020-05-02 Thread David Smiley (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098250#comment-17098250
 ] 

David Smiley commented on SOLR-14351:
-

Looks like I goofed in ZkContainer.java in which I removed the check 
zkController != null.  Consequently, in _standalone mode_, you'll see 
NullPointerException errors (actually benign) with a stack looking like:

at org.apache.solr.core.ZkContainer.lambda$registerInZk$1(ZkContainer.java:195) 
at org.apache.solr.core.ZkContainer.lambda$registerInZk$1(ZkContainer.java:195) 
at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:224)

etc.

I'll push a fix tomorrow with some updated javadocs on this ZkContainer class.

> Harden MDCLoggingContext.clear depth tracking
> -
>
> Key: SOLR-14351
> URL: https://issues.apache.org/jira/browse/SOLR-14351
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Minor
> Fix For: 8.6
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> MDCLoggingContext tracks recursive calls and only clears when the recursion 
> level is back down to 0.  If a caller forgets to register and ends up calling 
> clear any ways, then this can mess things up.  Additionally I found at least 
> one place this is occurring, which led me to investigate this matter.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9321) Port documentation task to gradle

2020-05-02 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098254#comment-17098254
 ] 

Tomoko Uchida commented on LUCENE-9321:
---

bq. Maybe we can dump all links by inserting 'print' and run the script.

I tried to dump all cross-module (relative) links by this patch to 
chackJavadocLinks.py
{code}
diff --git a/dev-tools/scripts/checkJavadocLinks.py 
b/dev-tools/scripts/checkJavadocLinks.py
index 5d07e27a588..a96879536c9 100644
--- a/dev-tools/scripts/checkJavadocLinks.py
+++ b/dev-tools/scripts/checkJavadocLinks.py
@@ -74,6 +74,12 @@ class FindHyperlinks(HTMLParser):
   elif href is not None:
 assert name is None
 href = href.strip()
+absolute_url = urlparse.urljoin(self.baseURL, href)
+prefix1 = '/'.join(urlparse.urlparse(self.baseURL).path.split('/')[:5])
+prefix2 = '/'.join(urlparse.urlparse(absolute_url).path.split('/')[:5])
+# print only cross-module relative links
+if re.match('^../', href) and prefix1 != prefix2:
+  print('%s\t%s\t%s' % (self.baseURL, href, absolute_url))
 self.links.append(urlparse.urljoin(self.baseURL, href))
   elif id is None:
 raise RuntimeError('couldn\'t find an href nor name in link in %s: 
only got these attrs: %s' % (self.baseURL, attrs))
@@ -130,8 +136,9 @@ def checkAll(dirName):
   global failures
 
   # Find/parse all HTML files first
-  print()
-  print('Crawl/parse...')
+  #print()
+  #print('Crawl/parse...')
+  print('filename\trelative path\tabsolute url')
   allFiles = {}
 
   if os.path.isfile(dirName):
@@ -160,8 +167,8 @@ def checkAll(dirName):
 allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), 
encoding='UTF-8').read())
 
   # ... then verify:
-  print()
-  print('Verify...')
+  #print()
+  #print('Verify...')
   for fullPath, (links, anchors) in allFiles.items():
 #print fullPath
 printed = False
{code}

I don't want to attach the results (as the output file is large), but this can 
be run as below
{code}
lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py 
lucene/build/docs/ > ~/work/lucene-javadocs-relative-paths.tsv
lucene-solr $ wc -l ~/work/lucene-javadocs-relative-paths.tsv 
31434 /home/moco/work/lucene-javadocs-relative-paths.tsv

lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py solr/build/docs/ 
> ~/work/solr-javadocs-relative-paths.tsv
lucene-solr $ wc -l ~/work/solr-javadocs-relative-paths.tsv 
9307 /home/moco/work/solr-javadocs-relative-paths.tsv
{code}

This includes both kind of relative paths - automatically generated links by 
javadoc tool and hand written links by human (I don't know there is a way to 
distinguish them). With gradle scripts on the current master, the number should 
be reduced since all automatically generated links are absolute ones with 
"renderJavadoc" task. 



> Port documentation task to gradle
> -
>
> Key: LUCENE-9321
> URL: https://issues.apache.org/jira/browse/LUCENE-9321
> Project: Lucene - Core
>  Issue Type: Sub-task
>  Components: general/build
>Reporter: Tomoko Uchida
>Assignee: Tomoko Uchida
>Priority: Major
> Attachments: screenshot-1.png
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> This is a placeholder issue for porting ant "documentation" task to gradle. 
> The generated documents should be able to be published on lucene.apache.org 
> web site on "as-is" basis.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org