[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097858#comment-17097858 ] Dawid Weiss commented on LUCENE-9321: - bq. Instead of gathering all javedocs into one place and checking relative links, could we fix the linting script to make it work with per-project folder ? In the script, I think we also can forbid someone to add relative links (which strengthens interdependencies between sub-projects) any more. I agree with Tomoko here. An additional bonus of not having cross-project relative links is that javadocs displayed by IDEs work properly. The top-level index is a different matter because it is for site needs only (and then you can link relative javadocs for each package). > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Time Spent: 1h 20m > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gunasekhardora commented on a change in pull request #1371: SOLR-14333: print readable version of CollapsedPostFilter query
gunasekhardora commented on a change in pull request #1371: URL: https://github.com/apache/lucene-solr/pull/1371#discussion_r418926468 ## File path: solr/core/src/java/org/apache/solr/search/CollapsingQParserPlugin.java ## @@ -128,6 +128,28 @@ field collapsing (with ngroups) when the number of distinct groups public static final String HINT_TOP_FC = "top_fc"; public static final String HINT_MULTI_DOCVALUES = "multi_docvalues"; + public enum NullPolicy { +IGNORE("ignore", 0), Review comment: @madrob Removed them. Added an unit test to validate if null policy is an illegal argument as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9278) Make javadoc folder structure follow Gradle project path
[ https://issues.apache.org/jira/browse/LUCENE-9278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097862#comment-17097862 ] ASF subversion and git services commented on LUCENE-9278: - Commit 951efc95be338cab3f693c45a50a9e36a237743e in lucene-solr's branch refs/heads/master from Uwe Schindler [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=951efc9 ] LUCENE-9278: Improved options file creation: All parameters are escaped automatically, arguments don't need to be strings (they are converted during building options file) (#1479) > Make javadoc folder structure follow Gradle project path > > > Key: LUCENE-9278 > URL: https://issues.apache.org/jira/browse/LUCENE-9278 > Project: Lucene - Core > Issue Type: Task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Fix For: master (9.0) > > Time Spent: 7h 10m > Remaining Estimate: 0h > > Current javadoc folder structure is derived from Ant project name. e.g.: > [https://lucene.apache.org/core/8_4_1/analyzers-icu/index.html] > [https://lucene.apache.org/solr/8_4_1/solr-solrj/index.html] > For Gradle build, it should also follow gradle project structure (path) > instead of ant one, to keep things simple to manage [1]. Hence, it will look > like this: > [https://lucene.apache.org/core/9_0_0/analysis/icu/index.html] > [https://lucene.apache.org/solr/9_0_0/solr/solrj/index.html] > [1] The change was suggested at the conversation between Dawid Weiss and I on > a github pr: [https://github.com/apache/lucene-solr/pull/1304] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14455) Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers
Sujith created SOLR-14455: - Summary: Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers Key: SOLR-14455 URL: https://issues.apache.org/jira/browse/SOLR-14455 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Reporter: Sujith The Autoscaling policy for ADDREPLICA is not functioning in Metric Based Triggers. The "preferredOperation" was given "*ADDREPLICA*" for a sample metric trigger and it wasnt functioning. However on the other hand, the operation MOVEREPLICA is working as expected. I tried this in Solr version 7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle
mocobeta commented on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622866628 Just for clarification... Eventually, the `build/documentation` folder should look like this regardless of which way we choose - keep per-project javadoc outputs, or outputs docs into top-level `documentation` folder from the beginning: **Lucene** ``` lucene/build/documentation ├── JRE_VERSION_MIGRATION.html ├── MIGRATE.html ├── SYSTEM_REQUIREMENTS.html ├── index.html ├── lucene_green_300.gif ├── analysis │ ├── common │ ├── icu │ ├── kuromoji │ ├── morfologik │ ├── nori │ ├── opennlp │ ├── phonetic │ ├── smartcn │ └── stempel ├── backward-codecs ├── benchmark ├── classification ├── codecs ├── core ├── demo ├── expressions ├── facet ├── grouping ├── highlighter ├── join ├── luke ├── memory ├── misc ├── monitor ├── queries ├── queryparser ├── replicator ├── sandbox ├── spatial-extras ├── spatial3d ├── suggest └── test-framework ``` **Solr** ``` solr/build/documentation ├── SYSTEM_REQUIREMENTS.html ├── index.html ├── contrib │ ├── analysis-extras │ ├── analytics │ ├── clustering │ ├── dataimporthandler │ ├── dataimporthandler-extras │ ├── extraction │ ├── jaegertracer-configurator │ ├── langid │ ├── ltr │ ├── prometheus-exporter │ └── velocity ├── core ├── solrj └── test-framework ``` The each subproject's javadoc folder structure is consistent with Gradle project path (as I emphasized on LUCENE-9278). Both `build/documentation` folder should be uploaded to lucene.apache.org website on an as-is basis (it's the final purpose of the `documentation` task). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta edited a comment on pull request #1477: LUCENE-9321: Port markdown task to Gradle
mocobeta edited a comment on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622866628 Just for clarification... Eventually, the `build/documentation` folder should look like this regardless of which way we choose - keep per-project javadoc outputs, or outputs docs into top-level `documentation` folder from the beginning: **Lucene** ``` lucene/build/documentation ├── JRE_VERSION_MIGRATION.html ├── MIGRATE.html ├── SYSTEM_REQUIREMENTS.html ├── index.html ├── lucene_green_300.gif ├── changes ├── analysis │ ├── common │ ├── icu │ ├── kuromoji │ ├── morfologik │ ├── nori │ ├── opennlp │ ├── phonetic │ ├── smartcn │ └── stempel ├── backward-codecs ├── benchmark ├── classification ├── codecs ├── core ├── demo ├── expressions ├── facet ├── grouping ├── highlighter ├── join ├── luke ├── memory ├── misc ├── monitor ├── queries ├── queryparser ├── replicator ├── sandbox ├── spatial-extras ├── spatial3d ├── suggest └── test-framework ``` **Solr** ``` solr/build/documentation ├── SYSTEM_REQUIREMENTS.html ├── index.html ├── changes ├── images ├── contrib │ ├── analysis-extras │ ├── analytics │ ├── clustering │ ├── dataimporthandler │ ├── dataimporthandler-extras │ ├── extraction │ ├── jaegertracer-configurator │ ├── langid │ ├── ltr │ ├── prometheus-exporter │ └── velocity ├── core ├── solrj └── test-framework ``` The each subproject's javadoc folder structure is consistent with Gradle project path (as I emphasized on LUCENE-9278). Both `build/documentation` folder should be uploaded to lucene.apache.org website on an as-is basis (it's the final purpose of the `documentation` task). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dweiss commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle
dweiss commented on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622879655 Thanks for clarifying, Tomoko. I'm much in favor of keeping the javadocs in per-project build folders but if Uwe insists this is a problem then would it be a large patch to build those docs under target documentation location? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mocobeta commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle
mocobeta commented on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622903717 > if Uwe insists this is a problem then would it be a large patch to build those docs under target documentation location? I think all we need is replacing `project.javadoc.destinationDir` variables in this file with the target documentation folder, `_docroot_/${pathToDocdir(project.path)}`. (For now the renderJavadoc task does not care about where is the final destination, so we have to teach it about the value of _docroot_ to the task in some way.) https://github.com/apache/lucene-solr/blob/master/gradle/render-javadoc.gradle This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14456) Compressed requests fail in SolrCloud when the request is routed internally by the serving solr node
Samuel García Martínez created SOLR-14456: - Summary: Compressed requests fail in SolrCloud when the request is routed internally by the serving solr node Key: SOLR-14456 URL: https://issues.apache.org/jira/browse/SOLR-14456 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrCloud Affects Versions: 7.7.2 Environment: Solr version: 7.7.2 Solr cloud enabled Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 HTTP LB using Round Robin over all nodes All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME types. Solr client: HttpSolrClient targeting the HTTP LB h3. Reporter: Samuel García Martínez h3. Solr cluster setup * Solr version: 7.7.2 * Solr cloud enabled * Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 HTTP LB using Round Robin over all nodes * All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME types. * Solr client: HttpSolrClient targeting the HTTP LB h3. Problem description When the Solr node that receives the request has to forward it to a Solr Node that can actually perform the query, the response headers are added incorrectly to the response, causing any HTTP client to fail, whether it's a SolrClient or a basic HTTP client implementation with any other SDK. To simplify the case, let's try to start from the following repro scenario: * Start one node with cloud mode and port 8983 * Create one single collection (1 shard, 1 replica) * Start another node with port 8984 and the previusly started zk (-z localhost:9983) * Start a java application and query the cluster using the node on port 8984 (the one that doesn't host the collection) So, then something like this happens: * The application queries node:8984 with compression enabled ("Accept-Encoding: gzip") and wt=javabin * Node:8984 can't perform the query and creates a http request behind the scenes to node:8983 * Node:8983 returns a gzipped response with "Content-Encoding: gzip" and "Content-Type: application/octet-stream" Node:8984 adds the "Content-Encoding: gzip" header as character stream to the response (it should be forwarded as "Content-Encoding" header, not character encoding) * HttpSolrClient receives a "Content-Type: application/octet-stream;charset=gzip", causing an exception. * HttpSolrClient tries to quietly close the connection, but since the stream is broken, the Utils.consumeFully fails to actually consume the entity (it throws another exception in GzipDecompressingEntity#getContent() with "not in GZIP format") The exception thrown by HttpSolrClient is: {code:java} java.nio.charset.UnsupportedCharsetException: gzip at java.nio.charset.Charset.forName(Charset.java:531) at org.apache.http.entity.ContentType.create(ContentType.java:271) at org.apache.http.entity.ContentType.create(ContentType.java:261) at org.apache.http.entity.ContentType.parse(ContentType.java:319) at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:591) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:194) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1015) at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:1031) at org.apache.solr.client.solrj.SolrClient$$FastClassBySpringCGLIB$$7fcf36a0.invoke() at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218){code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?
[ https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097883#comment-17097883 ] ASF subversion and git services commented on LUCENE-9087: - Commit 96c47bc8508142b5bd11d2cdc492df380801efec in lucene-solr's branch refs/heads/master from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=96c47bc ] LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512 > Should the BKD tree use a fixed maxPointsInLeafNode? > - > > Key: LUCENE-9087 > URL: https://issues.apache.org/jira/browse/LUCENE-9087 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Attachments: Study of BKD tree performance with different values for > max points per leaf.pdf > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the > constructor. For the current default codec the value is set to 1024. This is > a good compromise between memory usage and performance of the BKD tree. > Lowering this value can increase search performance but it has a penalty in > memory usage. Now that the BKD tree can be load off-heap, this can be less of > a concern. Note that lowering too much that value can hurt performance as > well as the tree becomes too deep and benefits are gone. > For data types that use the tree as an effective R-tree (ranges and shapes > datatypes) the benefits are larger as it can minimise the overlap between > leaf nodes. > Finally, creating too many leaf nodes can be dangerous at write time as > memory usage depends on the number of leaf nodes created. The writer creates > a long array of length = numberOfLeafNodes. > What I am wondering here is if we can improve this situation in order to > create the most efficient tree? My current ideas are: > > * We can adapt the points per leaf depending on that number so we create a > tree with the best depth and best points per leaf. Note that for the for 1D > case we have an upper estimation of the number of points that the tree will > be indexing. > * Add a mechanism so field types can easily define their best points per > leaf. In the case, field types like ranges or shapes can define its own value > to minimise overlap. > * Maybe the default is just too high now that we can load the tree off-heap. > Any thoughts? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle
uschindler commented on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622927763 Hi, thanks for the discussion. @mocobeta explains it correct: You just have to change one thing to change the destination dir in the render-javadoc. > I was not aware of that this task depends on projects' relative paths. To me, before proceeding it we need to manage to reach a consensus about the destination (output) directory for "renderJavadoc" anyway...? Actually this is my main concern. I will comment on this on the issue. I have an idea for that. The reason for the issue is because there is conflicting interests: javadocs-jar on Maven central vs. documentation folder on web site and inside tar.gz of whole Lucene bundle. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9087) Should the BKD tree use a fixed maxPointsInLeafNode?
[ https://issues.apache.org/jira/browse/LUCENE-9087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097887#comment-17097887 ] ASF subversion and git services commented on LUCENE-9087: - Commit 5a922c3c8523cd01fae4720a57132d12c20f1191 in lucene-solr's branch refs/heads/branch_8x from Ignacio Vera [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5a922c3 ] LUCENE-9087: Build always trees with full leaves and lower the default value for maxPointsPerLeafNode to 512 > Should the BKD tree use a fixed maxPointsInLeafNode? > - > > Key: LUCENE-9087 > URL: https://issues.apache.org/jira/browse/LUCENE-9087 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Ignacio Vera >Priority: Major > Attachments: Study of BKD tree performance with different values for > max points per leaf.pdf > > Time Spent: 1.5h > Remaining Estimate: 0h > > Currently the BKD tree uses a fixed maxPointsInLeafNode provided in the > constructor. For the current default codec the value is set to 1024. This is > a good compromise between memory usage and performance of the BKD tree. > Lowering this value can increase search performance but it has a penalty in > memory usage. Now that the BKD tree can be load off-heap, this can be less of > a concern. Note that lowering too much that value can hurt performance as > well as the tree becomes too deep and benefits are gone. > For data types that use the tree as an effective R-tree (ranges and shapes > datatypes) the benefits are larger as it can minimise the overlap between > leaf nodes. > Finally, creating too many leaf nodes can be dangerous at write time as > memory usage depends on the number of leaf nodes created. The writer creates > a long array of length = numberOfLeafNodes. > What I am wondering here is if we can improve this situation in order to > create the most efficient tree? My current ideas are: > > * We can adapt the points per leaf depending on that number so we create a > tree with the best depth and best points per leaf. Note that for the for 1D > case we have an upper estimation of the number of points that the tree will > be indexing. > * Add a mechanism so field types can easily define their best points per > leaf. In the case, field types like ranges or shapes can define its own value > to minimise overlap. > * Maybe the default is just too high now that we can load the tree off-heap. > Any thoughts? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-9321: -- Attachment: screenshot-1.png > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2h 10m > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14453) Solr proximity search highlighting issue
[ https://issues.apache.org/jira/browse/SOLR-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097895#comment-17097895 ] amit naliyapara commented on SOLR-14453: I tried unified method but not working. > Solr proximity search highlighting issue > > > Key: SOLR-14453 > URL: https://issues.apache.org/jira/browse/SOLR-14453 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 8.4.1 >Reporter: amit naliyapara >Priority: Major > Attachments: Highlighted-response.PNG, Not-Highlighted-response.PNG, > managed-schema, solr-doc-Id-1.txt > > > I found some problem in highlighting module. Not all the search terms are > getting highlighted. > Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30&hl=true > Indexed text: "pos1 pos2 pos3 pos4" > You can see that only two terms are highlighted like, "pos1 > pos2 pos3 pos4" > Please find attached Not-highlighted-response screen shot for same. > The scenario is when term positions are in-order in document and query both. > If term position not in-order then it work proper > Sample query: q={!complexphrase+inOrder=false}"pos3 (pos1 OR pos2)"~30&hl=true > You can see that all three term are highlighted like, "pos1 > pos2 pos3 pos4" > Please find attached Highlighted-response screen shot for same. > The scenario is same in Solr source code since long time (I have checked in > Solr version 4 to version 7). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] uschindler commented on pull request #1477: LUCENE-9321: Port markdown task to Gradle
uschindler commented on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622931376 See my langthly comment here: https://issues.apache.org/jira/browse/LUCENE-9321?focusedCommentId=17097899&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17097899 Please read it fully concentrated! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899 ] Uwe Schindler commented on LUCENE-9321: --- Hi, as said on the PR, there is actually two (or better three) different ways how to consume the javadocs and each of those have different requirements with inter-module links: # One consumer is using the javadocs from the Maven JAR files (e.g., in the IDE) # The other consumers are using the website where the javadocs are all at one place. They not only do this for releases, but also for snapshot builds by Jenkins (see below, this makes the current "absolute" links not working they way how it is setup). # Somebody downloads the tar.gz file of Lucene and wants to browse the javadocs from there. Actually I do this all the time when I am validating a release (because that's the only way to do this, as the javadocs are not yet deployed on central web page). The consumer #1 is perfectly fine with current setup. For us it's easy to package. The only things that is currenty borken is the way how the absolute links are generated: They are hardcoded!!! This cannot be like that. We have nightly snapshot builds on Jenkins where we producce snapshots where all Javadocs go into nowhere. In the ANT build this is handled by making the "Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding {{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property on the ANT invocation. By that all links which are absolute are correct. A release manager can also set this, but there's currently automatism in ANT: If the version does not end in "-SNAPSHOT", the links are generated using the absolute links using the version number. We have version.properties for that. This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for Gradle (just define "base URL path" with 2 properties): !screenshot-1.png|width=753,height=316! This allows to browse the full documentation here: [https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] (including valid absolute links also cross-project to Lucene). All Snapshot artifacts deployed on snapshots.apache.org (including ZIP files) have those links inside. This makes it easy for the user to browse and also somebody using the artifacts in his IDE (think about Elasticsearch or any other projects using snapshot artifacts from ASF). They are perfectly fine, it's now also better than before! No comes user #3: He downloads the targz/zip file and wants to browser Javadocs or the development member who votes for a release. He wants to show the javadocs. Unfortunately he can't as all links are dead (the Javadocs are not yet published). Also somebody who downloaded the tar.gz file wants to dive through the documentation with *relative* links. With just copying or symlinking all Javadocs to some central folder, this isn't satisfied. User #2 is somehow inbetween, but I tend to make him identical to user #3. I don't like it to publish HTML pages on lucene.apache.org with absolute links to lucene.apache.org. We recently changed to HTTPS, so for similar cases all links in historic Javadocs would need to be rewritten. Thanks to redirects it still works, but there can be man-in-the-middle problems. I wanted to download the whole SVN repository in the near future and let run a {{sed}} through it to fix all old links. This is major work. If links are all relative, you don't have that problem. bq. Other linting tasks in ant's "documentation-lint", ecjLint and checkMissingDocs work fine with per-project javadoc folder. They work, because documentation-lint does not check everything. The linter does not follow absolute links, so it can't verify. It just passes. It's OK to check that all links in the module are correct, but it can't check the full documentation. So before a release "documentation-lint" must also be ran on the tol level. This is a requirement for the release. But for this to work, the links must be relative. *Now comes my proposal:* - I tend to leave the per-project javadocs as is, they should be used to build maven artifacts. This makes IDE users happy and I hope also Dawid. The only thing is to allow to configure the lucene and solr specific "base" url for absolute links. This allows to make snapshot artifacts on Jenkins correctly. Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" or not. - For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should *not* copy them). For this second run for packaging purposes, we change the Javadocs output directory to the top-level one (as proposed by Tomoko). In addition the absolute links should be relative. This can easily be done using java.net.URI class. Just build the absolut
[GitHub] [lucene-solr] uschindler edited a comment on pull request #1477: LUCENE-9321: Port markdown task to Gradle
uschindler edited a comment on pull request #1477: URL: https://github.com/apache/lucene-solr/pull/1477#issuecomment-622931376 See my lengthly comment here: https://issues.apache.org/jira/browse/LUCENE-9321?focusedCommentId=17097899&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17097899 Please read it fully concentrated! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received
Samuel García Martínez created SOLR-14457: - Summary: SolrClient leaks a connection forever when an unexpected/malformed Entity is received Key: SOLR-14457 URL: https://issues.apache.org/jira/browse/SOLR-14457 Project: Solr Issue Type: Bug Security Level: Public (Default Security Level. Issues are Public) Components: SolrJ Affects Versions: 7.7.2 Environment: Solr version: 7.7.2 Solr cloud enabled Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 HTTP LB using Round Robin over all nodes All cluster nodes have gzip enabled for all paths, all HTTP verbs and all MIME types. Solr client: HttpSolrClient targeting the HTTP LB Reporter: Samuel García Martínez When the SolrJ receives a malformed response Entity, for example like the one described in SOLR-14456, the client leaks the connection forever as it's never released back to the pool. If Solr (for whatever reason) or any intermediate networking piece (firewall, proxy, load balancer) messes up the response, SolrJ tries to release the connection but GzipDecompressingEntity#getContent fails with an IOException("Not in GZIP format"), making it impossible to release the connection. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received
[ https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097902#comment-17097902 ] Samuel García Martínez commented on SOLR-14457: --- Scenarios that corrupt the response like SOLR-14456 breaks the connection management > SolrClient leaks a connection forever when an unexpected/malformed Entity is > received > - > > Key: SOLR-14457 > URL: https://issues.apache.org/jira/browse/SOLR-14457 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.2 > Environment: Solr version: 7.7.2 > Solr cloud enabled > Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 > HTTP LB using > Round Robin over all nodes > All cluster nodes have gzip enabled for all paths, all HTTP verbs and all > MIME types. > Solr client: HttpSolrClient targeting the HTTP LB >Reporter: Samuel García Martínez >Priority: Major > > When the SolrJ receives a malformed response Entity, for example like the one > described in SOLR-14456, the client leaks the connection forever as it's > never released back to the pool. > If Solr (for whatever reason) or any intermediate networking piece (firewall, > proxy, load balancer) messes up the response, SolrJ tries to release the > connection but GzipDecompressingEntity#getContent fails with an > IOException("Not in GZIP format"), making it impossible to release the > connection. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899 ] Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:30 AM: - Hi, as said on the PR, there is actually two (or better three) different ways how to consume the javadocs and each of those have different requirements with inter-module links: # One consumer is using the javadocs from the Maven JAR files (e.g., in the IDE) # The other consumers are using the website where the javadocs are all at one place. They not only do this for releases, but also for snapshot builds by Jenkins (see below, this makes the current "absolute" links not working they way how it is setup). # Somebody downloads the tar.gz file of Lucene and wants to browse the javadocs from there. Actually I do this all the time when I am validating a release (because that's the only way to do this, as the javadocs are not yet deployed on central web page). The consumer #1 is perfectly fine with current setup. For us it's easy to package. The only things that is currenty borken is the way how the absolute links are generated: They are hardcoded!!! This cannot be like that. We have nightly snapshot builds on Jenkins where we producce snapshots where all Javadocs go into nowhere. In the ANT build this is handled by making the "Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding {{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property on the ANT invocation. By that all links which are absolute are correct. A release manager can also set this, but there's currently automatism in ANT: If the version does not end in "-SNAPSHOT", the links are generated using the absolute links using the version number. We have version.properties for that. This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for Gradle (just define "base URL path" with 2 properties): !screenshot-1.png|width=753,height=316! This allows to browse the full documentation here: [https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] (including valid absolute links also cross-project to Lucene). All Snapshot artifacts deployed on snapshots.apache.org (including ZIP files) have those links inside. This makes it easy for the user to browse and also somebody using the artifacts in his IDE (think about Elasticsearch or any other projects using snapshot artifacts from ASF). They are perfectly fine, it's now also better than before! No comes user #3: He downloads the targz/zip file and wants to browser Javadocs or the development member who votes for a release. He wants to show the javadocs. Unfortunately he can't as all links are dead (the Javadocs are not yet published). Also somebody who downloaded the tar.gz file wants to dive through the documentation with *relative* links. With just copying or symlinking all Javadocs to some central folder, this isn't satisfied. User #2 is somehow inbetween, but I tend to make him identical to user #3. I don't like it to publish HTML pages on lucene.apache.org with absolute links to lucene.apache.org. We recently changed to HTTPS, so for similar cases all links in historic Javadocs would need to be rewritten. Thanks to redirects it still works, but there can be man-in-the-middle problems. I wanted to download the whole SVN repository in the near future and let run a {{sed}} through it to fix all old links. This is major work. If links are all relative, you don't have that problem. bq. Other linting tasks in ant's "documentation-lint", ecjLint and checkMissingDocs work fine with per-project javadoc folder. They work, because documentation-lint does not check everything. The linter does not follow absolute links, so it can't verify. It just passes. It's OK to check that all links in the module are correct, but it can't check the full documentation. So before a release "documentation-lint" must also be ran on the tol level. This is a requirement for the release. But for this to work, the links must be relative. *Now comes my proposal:* - I tend to leave the per-project javadocs as is, they should be used to build maven artifacts. This makes IDE users happy and I hope also Dawid. The only thing is to allow to configure the lucene and solr specific "base" url for absolute links. This allows to make snapshot artifacts on Jenkins correctly. Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" or not. - For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should *not* copy them). For this second run for packaging purposes, we change the Javadocs output directory to the top-level one (as proposed by Tomoko). In addition the absolute links should be relative. This can easily be don
[jira] [Updated] (SOLR-14457) SolrClient leaks a connection forever when an unexpected/malformed Entity is received
[ https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel García Martínez updated SOLR-14457: -- Description: When the SolrJ receives a malformed response Entity, for example like the one described in SOLR-14456, the client leaks the connection forever as it's never released back to the pool. If Solr (for whatever reason) or any intermediate networking piece (firewall, proxy, load balancer) messes up the response, SolrJ tries to release the connection but GzipDecompressingEntity#getContent fails with an IOException("Not in GZIP format"), making it impossible to release the connection. On top of the bug itself, not being able to set a timeout while waiting for a connection to be available, makes any application unresponsive as it will run out of threads eventually. was: When the SolrJ receives a malformed response Entity, for example like the one described in SOLR-14456, the client leaks the connection forever as it's never released back to the pool. If Solr (for whatever reason) or any intermediate networking piece (firewall, proxy, load balancer) messes up the response, SolrJ tries to release the connection but GzipDecompressingEntity#getContent fails with an IOException("Not in GZIP format"), making it impossible to release the connection. > SolrClient leaks a connection forever when an unexpected/malformed Entity is > received > - > > Key: SOLR-14457 > URL: https://issues.apache.org/jira/browse/SOLR-14457 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.2 > Environment: Solr version: 7.7.2 > Solr cloud enabled > Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 > HTTP LB using > Round Robin over all nodes > All cluster nodes have gzip enabled for all paths, all HTTP verbs and all > MIME types. > Solr client: HttpSolrClient targeting the HTTP LB >Reporter: Samuel García Martínez >Priority: Major > > When the SolrJ receives a malformed response Entity, for example like the one > described in SOLR-14456, the client leaks the connection forever as it's > never released back to the pool. > If Solr (for whatever reason) or any intermediate networking piece (firewall, > proxy, load balancer) messes up the response, SolrJ tries to release the > connection but GzipDecompressingEntity#getContent fails with an > IOException("Not in GZIP format"), making it impossible to release the > connection. > On top of the bug itself, not being able to set a timeout while waiting for a > connection to be available, makes any application unresponsive as it will run > out of threads eventually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899 ] Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:31 AM: - Hi, as said on the PR, there is actually two (or better three) different ways how to consume the javadocs and each of those have different requirements with inter-module links: # One consumer is using the javadocs from the Maven JAR files (e.g., in the IDE) # The other consumers are using the website where the javadocs are all at one place. They not only do this for releases, but also for snapshot builds by Jenkins (see below, this makes the current "absolute" links not working they way how it is setup). # Somebody downloads the tar.gz file of Lucene and wants to browse the javadocs from there. Actually I do this all the time when I am validating a release (because that's the only way to do this, as the javadocs are not yet deployed on central web page). The consumer #1 is perfectly fine with current setup. For us it's easy to package. The only things that is currenty borken is the way how the absolute links are generated: They are hardcoded!!! This cannot be like that. We have nightly snapshot builds on Jenkins where we producce snapshots where all Javadocs go into nowhere. In the ANT build this is handled by making the "Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding {{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property on the ANT invocation. By that all links which are absolute are correct. A release manager can also set this, but there's currently automatism in ANT: If the version does not end in "-SNAPSHOT", the links are generated using the absolute links using the version number. We have version.properties for that. This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for Gradle (just define "base URL path" with 2 properties): !screenshot-1.png|width=753,height=316! This allows to browse the full documentation here: [https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] (including valid absolute links also cross-project to Lucene). All Snapshot artifacts deployed on snapshots.apache.org (including ZIP files) have those links inside. This makes it easy for the user to browse and also somebody using the artifacts in his IDE (think about Elasticsearch or any other projects using snapshot artifacts from ASF). They are perfectly fine, it's now also better than before! No comes user #3: He downloads the targz/zip file and wants to browser Javadocs or the development member who votes for a release. He wants to show the javadocs. Unfortunately he can't as all links are dead (the Javadocs are not yet published). Also somebody who downloaded the tar.gz file wants to dive through the documentation with *relative* links. With just copying or symlinking all Javadocs to some central folder, this isn't satisfied. User #2 is somehow inbetween, but I tend to make him identical to user #3. I don't like it to publish HTML pages on lucene.apache.org with absolute links to lucene.apache.org. We recently changed to HTTPS, so for similar cases all links in historic Javadocs would need to be rewritten. Thanks to redirects it still works, but there can be man-in-the-middle problems. I wanted to download the whole SVN repository in the near future and let run a {{sed}} through it to fix all old links. This is major work. If links are all relative, you don't have that problem. bq. Other linting tasks in ant's "documentation-lint", ecjLint and checkMissingDocs work fine with per-project javadoc folder. They work, because documentation-lint does not check everything. The linter does not follow absolute links, so it can't verify. It just passes. It's OK to check that all links in the module are correct, but it can't check the full documentation. So before a release "documentation-lint" must also be ran on the top level. This is a requirement for the release. But for this to work, the links must be relative. *Now comes my proposal:* - I tend to leave the per-project javadocs as is, they should be used to build maven artifacts. This makes IDE users happy and I hope also Dawid. The only thing is to allow to configure the lucene and solr specific "base" url for absolute links. This allows to make snapshot artifacts on Jenkins correctly. Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" or not. - For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should *not* copy them). For this second run for packaging purposes, we change the Javadocs output directory to the top-level one (as proposed by Tomoko). In addition the absolute links should be relative. This can easily be don
[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899 ] Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:31 AM: - Hi, as said on the PR, there is actually two (or better three) different ways how to consume the javadocs and each of those have different requirements with inter-module links: # One consumer is using the javadocs from the Maven JAR files (e.g., in the IDE) # The other consumers are using the website where the javadocs are all at one place. They not only do this for releases, but also for snapshot builds by Jenkins (see below, this makes the current "absolute" links not working they way how it is setup). # Somebody downloads the tar.gz file of Lucene and wants to browse the javadocs from there. Actually I do this all the time when I am validating a release (because that's the only way to do this, as the javadocs are not yet deployed on central web page). The consumer #1 is perfectly fine with current setup. For us it's easy to package. The only things that is currenty borken is the way how the absolute links are generated: They are hardcoded!!! This cannot be like that. We have nightly snapshot builds on Jenkins where we producce snapshots where all Javadocs go into nowhere. In the ANT build this is handled by making the "Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding {{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property on the ANT invocation. By that all links which are absolute are correct. A release manager can also set this, but there's currently automatism in ANT: If the version does not end in "-SNAPSHOT", the links are generated using the absolute links using the version number. We have version.properties for that. This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for Gradle (just define "base URL path" with 2 properties): !screenshot-1.png|width=753,height=316! This allows to browse the full documentation here: [https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] (including valid absolute links also cross-project to Lucene). All Snapshot artifacts deployed on snapshots.apache.org (including ZIP files) have those links inside. This makes it easy for the user to browse and also somebody using the artifacts in his IDE (think about Elasticsearch or any other projects using snapshot artifacts from ASF). They are perfectly fine, it's now also better than before! No comes user #3: He downloads the targz/zip file and wants to browser Javadocs or the development member who votes for a release. He wants to show the javadocs. Unfortunately he can't as all links are dead (the Javadocs are not yet published). Also somebody who downloaded the tar.gz file wants to dive through the documentation with *relative* links. With just copying or symlinking all Javadocs to some central folder, this isn't satisfied. User #2 is somehow inbetween, but I tend to make him identical to user #3. I don't like it to publish HTML pages on lucene.apache.org with absolute links to lucene.apache.org. We recently changed to HTTPS, so for similar cases all links in historic Javadocs would need to be rewritten. Thanks to redirects it still works, but there can be man-in-the-middle problems. I wanted to download the whole SVN repository in the near future and let run a {{sed}} through it to fix all old links. This is major work. If links are all relative, you don't have that problem. bq. Other linting tasks in ant's "documentation-lint", ecjLint and checkMissingDocs work fine with per-project javadoc folder. They work, because documentation-lint does not check everything. The linter does not follow absolute links, so it can't verify. It just passes. It's OK to check that all links in the module are correct, but it can't check the full documentation. So before a release "documentation-lint" must also be ran on the tol level. This is a requirement for the release. But for this to work, the links must be relative. *Now comes my proposal:* - I tend to leave the per-project javadocs as is, they should be used to build maven artifacts. This makes IDE users happy and I hope also Dawid. The only thing is to allow to configure the lucene and solr specific "base" url for absolute links. This allows to make snapshot artifacts on Jenkins correctly. Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" or not. - For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should *not* copy them). For this second run for packaging purposes, we change the Javadocs output directory to the top-level one (as proposed by Tomoko). In addition the absolute links should be relative. This can easily be don
[jira] [Updated] (SOLR-14457) SolrClient leaks connections on compressed responses if the response is malformed
[ https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel García Martínez updated SOLR-14457: -- Summary: SolrClient leaks connections on compressed responses if the response is malformed (was: SolrClient leaks a connection forever when an unexpected/malformed Entity is received) > SolrClient leaks connections on compressed responses if the response is > malformed > - > > Key: SOLR-14457 > URL: https://issues.apache.org/jira/browse/SOLR-14457 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.2 > Environment: Solr version: 7.7.2 > Solr cloud enabled > Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 > HTTP LB using > Round Robin over all nodes > All cluster nodes have gzip enabled for all paths, all HTTP verbs and all > MIME types. > Solr client: HttpSolrClient targeting the HTTP LB >Reporter: Samuel García Martínez >Priority: Major > > When the SolrJ receives a malformed response Entity, for example like the one > described in SOLR-14456, the client leaks the connection forever as it's > never released back to the pool. > If Solr (for whatever reason) or any intermediate networking piece (firewall, > proxy, load balancer) messes up the response, SolrJ tries to release the > connection but GzipDecompressingEntity#getContent fails with an > IOException("Not in GZIP format"), making it impossible to release the > connection. > On top of the bug itself, not being able to set a timeout while waiting for a > connection to be available, makes any application unresponsive as it will run > out of threads eventually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097899#comment-17097899 ] Uwe Schindler edited comment on LUCENE-9321 at 5/2/20, 10:33 AM: - Hi, as said on the PR, there is actually two (or better three) different ways how to consume the javadocs and each of those have different requirements with inter-module links: # One consumer is using the javadocs from the Maven JAR files (e.g., in the IDE) # The other consumers are using the website where the javadocs are all at one place. They not only do this for releases, but also for snapshot builds by Jenkins (see below, this makes the current "absolute" links not working they way how it is setup). # Somebody downloads the tar.gz file of Lucene and wants to browse the javadocs from there. Actually I do this all the time when I am validating a release (because that's the only way to do this, as the javadocs are not yet deployed on central web page). The consumer #1 is perfectly fine with current setup. For us it's easy to package. The only things that is currenty borken is the way how the absolute links are generated: They are hardcoded!!! This cannot be like that. We have nightly snapshot builds on Jenkins where we producce snapshots where all Javadocs go into nowhere. In the ANT build this is handled by making the "Documentation base URL" configurable for Lucene/Solr: Instead of hardcoding {{[https://lucene.apache.org/lucene/a_b_c]}} the Jenkins server sets a property on the ANT invocation. By that all links which are absolute are correct. A release manager can also set this, but there's currently automatism in ANT: If the version does not end in "-SNAPSHOT", the links are generated using the absolute links using the version number. We have version.properties for that. This is how Jenkins (Solr 8.x Job) is setup, the same should be possible for Gradle (just define "base URL path" with 2 properties): !screenshot-1.png|width=753,height=316! This allows to browse the full documentation here: [https://builds.apache.org/view/L/view/Lucene/job/Solr-Artifacts-8.x/javadoc/] (including valid absolute links also cross-project to Lucene). All Snapshot artifacts deployed on snapshots.apache.org (including ZIP files) have those links inside. This makes it easy for the user to browse and also somebody using the artifacts in his IDE (think about Elasticsearch or any other projects using snapshot artifacts from ASF). They are perfectly fine, it's now also better than before! No comes user #3: He downloads the targz/zip file and wants to browser Javadocs or the development member who votes for a release. He wants to show the javadocs. Unfortunately he can't as all links are dead (the Javadocs are not yet published). Also somebody who downloaded the tar.gz file wants to dive through the documentation with *relative* links. With just copying or symlinking all Javadocs to some central folder, this isn't satisfied. User #2 is somehow inbetween, but I tend to make him identical to user #3. I don't like it to publish HTML pages on lucene.apache.org with absolute links to lucene.apache.org. We recently changed to HTTPS, so for similar cases all links in historic Javadocs would need to be rewritten. Thanks to redirects it still works, but there can be man-in-the-middle problems. I wanted to download the whole SVN repository in the near future and let run a {{sed}} through it to fix all old links. This is major work. If links are all relative, you don't have that problem. bq. Other linting tasks in ant's "documentation-lint", ecjLint and checkMissingDocs work fine with per-project javadoc folder. They work, because documentation-lint does not check everything. The linter does not follow absolute links, so it can't verify. It just passes. It's OK to check that all links in the module are correct, but it can't check the full documentation. So before a release "documentation-lint" must also be ran on the top level. This is a requirement for the release. But for this to work, the links must be relative. *Now comes my proposal:* - I tend to leave the per-project javadocs as is, they should be used to build maven artifacts. This makes IDE users happy and I hope also Dawid. The only thing is to allow to configure the lucene and solr specific "base" url for absolute links. This allows to make snapshot artifacts on Jenkins correctly. Maybe also copy the "heuristic" from Ant to generate links based on "-SNAPSHOT" or not. - For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should *not* copy them). For this second run for packaging purposes, we change the Javadocs output directory to the top-level one (as proposed by Tomoko). In addition the absolute links should be relative. This can easily be don
[jira] [Updated] (SOLR-14457) SolrClient leaks connections on compressed responses if the response is malformed
[ https://issues.apache.org/jira/browse/SOLR-14457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samuel García Martínez updated SOLR-14457: -- Description: h3. Summary When the SolrJ receives a malformed response Entity, for example like the one described in SOLR-14456, the client leaks the connection forever as it's never released back to the pool. h3. Problem description HttpSolrClient should have compression enabled, so it uses the compression interceptors. When the request is marked with "Content-Encoding: gzip" but for whatever reason the response is not in GZIP format, when HttpSolrClient tries to close the connection using Utils.consumeFully(), it tries to create the GzipInputStream failing and throwing an Exception. The exception thrown makes it impossible to access the underlying InputStream to be closed, therefore the connection is leaked. Despite that the content in the response should honour the headers specified for it, SolrJ should be reliable enough to prevent the connection leak when this scenario happens. On top of the bug itself, not being able to set a timeout while waiting for a connection to be available, makes any application unresponsive as it will run out of threads eventually. was: When the SolrJ receives a malformed response Entity, for example like the one described in SOLR-14456, the client leaks the connection forever as it's never released back to the pool. If Solr (for whatever reason) or any intermediate networking piece (firewall, proxy, load balancer) messes up the response, SolrJ tries to release the connection but GzipDecompressingEntity#getContent fails with an IOException("Not in GZIP format"), making it impossible to release the connection. On top of the bug itself, not being able to set a timeout while waiting for a connection to be available, makes any application unresponsive as it will run out of threads eventually. > SolrClient leaks connections on compressed responses if the response is > malformed > - > > Key: SOLR-14457 > URL: https://issues.apache.org/jira/browse/SOLR-14457 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: SolrJ >Affects Versions: 7.7.2 > Environment: Solr version: 7.7.2 > Solr cloud enabled > Cluster topology: 6 nodes, 1 single collection, 10 shards and 3 replicas. 1 > HTTP LB using > Round Robin over all nodes > All cluster nodes have gzip enabled for all paths, all HTTP verbs and all > MIME types. > Solr client: HttpSolrClient targeting the HTTP LB >Reporter: Samuel García Martínez >Priority: Major > > h3. Summary > When the SolrJ receives a malformed response Entity, for example like the one > described in SOLR-14456, the client leaks the connection forever as it's > never released back to the pool. > h3. Problem description > HttpSolrClient should have compression enabled, so it uses the compression > interceptors. > When the request is marked with "Content-Encoding: gzip" but for whatever > reason the response is not in GZIP format, when HttpSolrClient tries to > close the connection using Utils.consumeFully(), it tries to create the > GzipInputStream failing and throwing an Exception. The exception thrown makes > it impossible to access the underlying InputStream to be closed, therefore > the connection is leaked. > Despite that the content in the response should honour the headers specified > for it, SolrJ should be reliable enough to prevent the connection leak when > this scenario happens. On top of the bug itself, not being able to set a > timeout while waiting for a connection to be available, makes any application > unresponsive as it will run out of threads eventually. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] romseygeek commented on a change in pull request #1467: LUCENE-9350: Don't hold references to large automata on FuzzyQuery
romseygeek commented on a change in pull request #1467: URL: https://github.com/apache/lucene-solr/pull/1467#discussion_r418947150 ## File path: lucene/core/src/java/org/apache/lucene/search/FuzzyQuery.java ## @@ -183,7 +162,7 @@ public void visit(QueryVisitor visitor) { if (maxEdits == 0 || prefixLength >= term.text().length()) { visitor.consumeTerms(this, term); } else { -automata[automata.length - 1].visit(visitor, this, field); +visitor.consumeTermsMatching(this, term.field(), () -> getAutomata().runAutomaton); Review comment: Only if the visitor implementation actually needs it, we're passing a `Supplier` now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14455) Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers
[ https://issues.apache.org/jira/browse/SOLR-14455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erick Erickson resolved SOLR-14455. --- Resolution: Incomplete Please raise questions like this on the user's list. There's not enough information here to hope to reproduce, and a discussion on the user's list should clarify exactly what your setup is and whether it's a bug or a a misunderstanding on your part. See: http://lucene.apache.org/solr/community.html#mailing-lists-irc there are links to both Lucene and Solr mailing lists there. A _lot_ more people will see your question on that list and may be able to help more quickly. You might want to review: https://wiki.apache.org/solr/UsingMailingLists If it's determined that this really is a code issue or enhancement to Lucene or Solr and not a configuration/usage problem, we can raise a new JIRA or reopen this one. > Autoscaling policy for ADDREPLICA not functioning in Metric Based Triggers > -- > > Key: SOLR-14455 > URL: https://issues.apache.org/jira/browse/SOLR-14455 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Sujith >Priority: Major > > The Autoscaling policy for ADDREPLICA is not functioning in Metric Based > Triggers. The "preferredOperation" was given "*ADDREPLICA*" for a sample > metric trigger and it wasnt functioning. However on the other hand, the > operation MOVEREPLICA is working as expected. I tried this in Solr version 7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9355) missing releases from testbackwardscompatibility
[ https://issues.apache.org/jira/browse/LUCENE-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097940#comment-17097940 ] Adrien Grand commented on LUCENE-9355: -- [~noble.paul] The smoke tester jobs have been failing since April 27th, see e.g. this build failure: https://builds.apache.org/view/L/view/Lucene/job/Lucene-Solr-SmokeRelease-8.x/420/console. The procedure is documented at https://cwiki.apache.org/confluence/display/LUCENE/ReleaseTodo#ReleaseTodo-GenerateBackcompatIndexes. I see that the release hasn't been announced yet either as Jan pointed out on the dev list, it looks like you haven't completed all release steps? > missing releases from testbackwardscompatibility > > > Key: LUCENE-9355 > URL: https://issues.apache.org/jira/browse/LUCENE-9355 > Project: Lucene - Core > Issue Type: Test >Reporter: Mike Drob >Priority: Major > > I'm not sure what needs to be added for the 7.7.3 release, but can you take a > look at it [~noble] or figure out who to ask for help? > {noformat} >[smoker] confirm all releases have coverage in TestBackwardsCompatibility >[smoker] find all past Lucene releases... >[smoker] run TestBackwardsCompatibility.. >[smoker] Releases that don't seem to be tested: >[smoker] 7.7.3 >[smoker] Traceback (most recent call last): >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1487, in >[smoker] main() >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1413, in main >[smoker] downloadOnly=c.download_only) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1465, in smokeTest >[smoker] unpackAndVerify(java, 'lucene', tmpDir, 'lucene-%s-src.tgz' % > version, gitRevision, version, testArgs, baseURL) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 566, in unpackAndVerify >[smoker] verifyUnpacked(java, project, artifact, unpackPath, > gitRevision, version, testArgs, tmpDir, baseURL) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 752, in verifyUnpacked >[smoker] confirmAllReleasesAreTestedForBackCompat(version, unpackPath) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1388, in confirmAllReleasesAreTestedForBackCompat >[smoker] raise RuntimeError('some releases are not tested by > TestBackwardsCompatibility?') >[smoker] RuntimeError: some releases are not tested by > TestBackwardsCompatibility? > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9355) missing releases from testbackwardscompatibility
[ https://issues.apache.org/jira/browse/LUCENE-9355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17097961#comment-17097961 ] Jan Høydahl commented on LUCENE-9355: - All I can say is that the RM job requires careful attention to each and every step. I recommend the releaseWizard since it helps keep a checklist of what is completed and what remains. One of the steps should be to add back compact to later releases. > missing releases from testbackwardscompatibility > > > Key: LUCENE-9355 > URL: https://issues.apache.org/jira/browse/LUCENE-9355 > Project: Lucene - Core > Issue Type: Test >Reporter: Mike Drob >Priority: Major > > I'm not sure what needs to be added for the 7.7.3 release, but can you take a > look at it [~noble] or figure out who to ask for help? > {noformat} >[smoker] confirm all releases have coverage in TestBackwardsCompatibility >[smoker] find all past Lucene releases... >[smoker] run TestBackwardsCompatibility.. >[smoker] Releases that don't seem to be tested: >[smoker] 7.7.3 >[smoker] Traceback (most recent call last): >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1487, in >[smoker] main() >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1413, in main >[smoker] downloadOnly=c.download_only) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1465, in smokeTest >[smoker] unpackAndVerify(java, 'lucene', tmpDir, 'lucene-%s-src.tgz' % > version, gitRevision, version, testArgs, baseURL) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 566, in unpackAndVerify >[smoker] verifyUnpacked(java, project, artifact, unpackPath, > gitRevision, version, testArgs, tmpDir, baseURL) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 752, in verifyUnpacked >[smoker] confirmAllReleasesAreTestedForBackCompat(version, unpackPath) >[smoker] File > "/home/jenkins/jenkins-slave/workspace/Lucene-Solr-SmokeRelease-master/dev-tools/scripts/smokeTestRelease.py", > line 1388, in confirmAllReleasesAreTestedForBackCompat >[smoker] raise RuntimeError('some releases are not tested by > TestBackwardsCompatibility?') >[smoker] RuntimeError: some releases are not tested by > TestBackwardsCompatibility? > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9356) Add tests for corruptions caused by byte flips
Adrien Grand created LUCENE-9356: Summary: Add tests for corruptions caused by byte flips Key: LUCENE-9356 URL: https://issues.apache.org/jira/browse/LUCENE-9356 Project: Lucene - Core Issue Type: Test Reporter: Adrien Grand We already have tests that file truncation and modification of the index headers are caught correctly. I'd like to add another test that flipping a byte in a way that modifies the checksum of the file is always caught gracefully by Lucene. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin
janhoy commented on pull request #341: URL: https://github.com/apache/lucene-solr/pull/341#issuecomment-622957422 I believe it is up to date with master now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] janhoy commented on a change in pull request #341: SOLR-12131: ExternalRoleRuleBasedAuthorizationPlugin
janhoy commented on a change in pull request #341: URL: https://github.com/apache/lucene-solr/pull/341#discussion_r418962126 ## File path: solr/core/src/java/org/apache/solr/security/RuleBasedAuthorizationPluginBase.java ## @@ -0,0 +1,270 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.solr.security; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.security.Principal; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.HashMap; +import java.util.HashSet; +import java.util.List; +import java.util.Map; +import java.util.Set; +import java.util.function.Function; + +import org.apache.solr.common.SpecProvider; +import org.apache.solr.common.util.CommandOperation; +import org.apache.solr.common.util.Utils; +import org.apache.solr.common.util.ValidatingJsonMap; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static java.util.Collections.unmodifiableMap; +import static java.util.function.Function.identity; +import static java.util.stream.Collectors.toMap; +import static org.apache.solr.handler.admin.SecurityConfHandler.getListValue; + +/** + * Base class for rule based authorization plugins + */ +public abstract class RuleBasedAuthorizationPluginBase implements AuthorizationPlugin, ConfigEditablePlugin, SpecProvider { Review comment: We could of course have kept one RBAC class, made the user-group mapping optional and always checked for roles on Principal, but I like the subclass approach better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] jpountz commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.
jpountz commented on a change in pull request #1473: URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r418963368 ## File path: lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsReader.java ## @@ -148,56 +155,80 @@ public BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState CodecUtil.retrieveChecksum(termsIn); // Read per-field details - seekDir(termsIn); - seekDir(indexIn); + String metaName = IndexFileNames.segmentFileName(segment, state.segmentSuffix, TERMS_META_EXTENSION); + Map fieldMap = null; + Throwable priorE = null; + try (ChecksumIndexInput metaIn = version >= VERSION_META_FILE ? state.directory.openChecksumInput(metaName, state.context) : null) { +try { + final IndexInput indexMetaIn, termsMetaIn; + if (version >= VERSION_META_FILE) { +CodecUtil.checkIndexHeader(metaIn, TERMS_META_CODEC_NAME, version, version, state.segmentInfo.getId(), state.segmentSuffix); +indexMetaIn = termsMetaIn = metaIn; + } else { +seekDir(termsIn); +seekDir(indexIn); +indexMetaIn = indexIn; +termsMetaIn = termsIn; + } - final int numFields = termsIn.readVInt(); - if (numFields < 0) { -throw new CorruptIndexException("invalid numFields: " + numFields, termsIn); - } - fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1); - for (int i = 0; i < numFields; ++i) { -final int field = termsIn.readVInt(); -final long numTerms = termsIn.readVLong(); -if (numTerms <= 0) { - throw new CorruptIndexException("Illegal numTerms for field number: " + field, termsIn); -} -final BytesRef rootCode = readBytesRef(termsIn); -final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field); -if (fieldInfo == null) { - throw new CorruptIndexException("invalid field number: " + field, termsIn); -} -final long sumTotalTermFreq = termsIn.readVLong(); -// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only one value is written. -final long sumDocFreq = fieldInfo.getIndexOptions() == IndexOptions.DOCS ? sumTotalTermFreq : termsIn.readVLong(); -final int docCount = termsIn.readVInt(); -if (version < VERSION_META_LONGS_REMOVED) { - final int longsSize = termsIn.readVInt(); - if (longsSize < 0) { -throw new CorruptIndexException("invalid longsSize for field: " + fieldInfo.name + ", longsSize=" + longsSize, termsIn); + final int numFields = termsMetaIn.readVInt(); + if (numFields < 0) { +throw new CorruptIndexException("invalid numFields: " + numFields, termsMetaIn); + } + fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1); + for (int i = 0; i < numFields; ++i) { +final int field = termsMetaIn.readVInt(); +final long numTerms = termsMetaIn.readVLong(); +if (numTerms <= 0) { + throw new CorruptIndexException("Illegal numTerms for field number: " + field, termsMetaIn); +} +final BytesRef rootCode = readBytesRef(termsMetaIn); +final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field); +if (fieldInfo == null) { + throw new CorruptIndexException("invalid field number: " + field, termsMetaIn); +} +final long sumTotalTermFreq = termsMetaIn.readVLong(); +// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only one value is written. +final long sumDocFreq = fieldInfo.getIndexOptions() == IndexOptions.DOCS ? sumTotalTermFreq : termsMetaIn.readVLong(); +final int docCount = termsMetaIn.readVInt(); +if (version < VERSION_META_LONGS_REMOVED) { + final int longsSize = termsMetaIn.readVInt(); + if (longsSize < 0) { +throw new CorruptIndexException("invalid longsSize for field: " + fieldInfo.name + ", longsSize=" + longsSize, termsMetaIn); + } +} +BytesRef minTerm = readBytesRef(termsMetaIn); +BytesRef maxTerm = readBytesRef(termsMetaIn); +if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // #docs with field must be <= #docs + throw new CorruptIndexException("invalid docCount: " + docCount + " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn); Review comment: Not directly, and these things are hard to test, though I agree we could do better. I opened https://issues.apache.org/jira/browse/LUCENE-9356 to try to improve the coverage of these code paths. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub
[jira] [Resolved] (SOLR-14453) Solr proximity search highlighting issue
[ https://issues.apache.org/jira/browse/SOLR-14453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley resolved SOLR-14453. - Resolution: Won't Fix I think this is probably a bug or limitation in the underlying SpanQuery and not the highlighters. There are some limitations of SpanQuery based queries in which it won't necessarily report all matches in the returned "spans". I don't think SpanQueries are going to be fixed (sorry) because it's a rather fundamental problem with their internal design. Instead, the Lucene project recently created a new class of queries to semi-replace SpanQuery: {{IntervalQuery}} -- _tah-dah_! I did a quick hack of {{org.apache.lucene.search.uhighlight.TestUnifiedHighlighterTermIntervals#testMatchesSlopBug}} to tweak it to look like your bug report here and it highlighted them as you want. Unfortunately, there are no query parsers in Lucene or Solr that produce them yet. Perhaps ComplexPhraseQueryParser should be modified to use IntervalQuery instead of SpanQuery. CC [~romseygeek] > Solr proximity search highlighting issue > > > Key: SOLR-14453 > URL: https://issues.apache.org/jira/browse/SOLR-14453 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: highlighter >Affects Versions: 8.4.1 >Reporter: amit naliyapara >Priority: Major > Attachments: Highlighted-response.PNG, Not-Highlighted-response.PNG, > managed-schema, solr-doc-Id-1.txt > > > I found some problem in highlighting module. Not all the search terms are > getting highlighted. > Sample query: q={!complexphrase+inOrder=true}"pos1 (pos2 OR pos3)"~30&hl=true > Indexed text: "pos1 pos2 pos3 pos4" > You can see that only two terms are highlighted like, "pos1 > pos2 pos3 pos4" > Please find attached Not-highlighted-response screen shot for same. > The scenario is when term positions are in-order in document and query both. > If term position not in-order then it work proper > Sample query: q={!complexphrase+inOrder=false}"pos3 (pos1 OR pos2)"~30&hl=true > You can see that all three term are highlighted like, "pos1 > pos2 pos3 pos4" > Please find attached Highlighted-response screen shot for same. > The scenario is same in Solr source code since long time (I have checked in > Solr version 4 to version 7). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098065#comment-17098065 ] Dawid Weiss commented on LUCENE-9321: - bq. For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should not copy them). I wouldn't require a second pass. If it's something required for the "release" then let's have a releast task in gradle and take care of it there. Otherwise the "release" scripts are duplicating what could as well be done within the main build script? Also, I'm sorry if this is a stupid question but can we just *not* have any cross-module links at all? How many of these cross-module links are we talking about? Maybe we can just dump them altogether? > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098065#comment-17098065 ] Dawid Weiss edited comment on LUCENE-9321 at 5/2/20, 7:07 PM: -- bq. For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should not copy them). I wouldn't require a second (independent) build pass. If it's something required for the "release" then let's have a release task in gradle and take care of it there (javadocs built twice but within the same run of the build - the "release" build). Otherwise the "release" scripts are duplicating what could as well be done within the main build script? Also, I'm sorry if this is a stupid question but can we just *not* have any cross-module links at all? How many of these cross-module links are we talking about? Maybe we can just dump them altogether? was (Author: dweiss): bq. For the website and .tar.gz release (so packaging) the release manager should run the whole javadocs a second time (we should not copy them). I wouldn't require a second pass. If it's something required for the "release" then let's have a releast task in gradle and take care of it there. Otherwise the "release" scripts are duplicating what could as well be done within the main build script? Also, I'm sorry if this is a stupid question but can we just *not* have any cross-module links at all? How many of these cross-module links are we talking about? Maybe we can just dump them altogether? > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.
mkhludnev commented on a change in pull request #1462: URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010061 ## File path: lucene/grouping/src/java/org/apache/lucene/search/grouping/DocValuesPoolingReader.java ## @@ -0,0 +1,175 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.search.grouping; + +import java.io.IOException; +import java.util.HashMap; +import java.util.Map; + +import org.apache.lucene.index.BinaryDocValues; +import org.apache.lucene.index.DocValues; +import org.apache.lucene.index.FilterLeafReader; +import org.apache.lucene.index.LeafReader; +import org.apache.lucene.index.NumericDocValues; +import org.apache.lucene.index.SortedDocValues; +import org.apache.lucene.index.SortedNumericDocValues; +import org.apache.lucene.index.SortedSetDocValues; +import org.apache.lucene.index.TermsEnum; +import org.apache.lucene.search.DocIdSetIterator; +import org.apache.lucene.util.BytesRef; + +/** + * Caches docValues for the given {@linkplain LeafReader}. + * It only necessary when consumer retrieves same docValues many times per + * segment. Returned docValues should be iterated forward only. + * Caveat: {@link #getContext()} is completely misguiding for this class since + * it looses baseDoc, ord from underneath context. + * @lucene.experimental + * */ +class DocValuesPoolingReader extends FilterLeafReader { + + @FunctionalInterface + interface DVSupplier{ +T getDocValues(String field) throws IOException; + } + + private Map cache = new HashMap<>(); + + DocValuesPoolingReader(LeafReader in) { +super(in); + } + + @SuppressWarnings("unchecked") + protected T computeIfAbsent(String field, DVSupplier supplier) throws IOException { +T dv; +if ((dv = (T) cache.get(field)) == null) { + dv = supplier.getDocValues(field); + cache.put(field, dv); +} +return dv; + } + + @Override + public CacheHelper getReaderCacheHelper() { +return null; + } + + @Override + public CacheHelper getCoreCacheHelper() { +return null; + } + + @Override + public BinaryDocValues getBinaryDocValues(String field) throws IOException { +return computeIfAbsent(field, in::getBinaryDocValues); + } + + @Override + public NumericDocValues getNumericDocValues(String field) throws IOException { +return computeIfAbsent(field, in::getNumericDocValues); + } + + @Override + public SortedNumericDocValues getSortedNumericDocValues(String field) throws IOException { +return computeIfAbsent(field, in::getSortedNumericDocValues); + } + + public SortedDocValues getSortedDocValues(String field) throws IOException { +return computeIfAbsent(field, in::getSortedDocValues); + } + + @Override + public SortedSetDocValues getSortedSetDocValues(String field) throws IOException { +return computeIfAbsent(field, field1 -> { + final SortedSetDocValues sortedSet = in.getSortedSetDocValues(field1); + final SortedDocValues singleton = DocValues.unwrapSingleton(sortedSet); Review comment: `SingletonWrapper` is too strict, relaxing it in own copy. Need to be done for `NumericsSetDV` as This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.
mkhludnev commented on a change in pull request #1462: URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010144 ## File path: lucene/grouping/src/test/org/apache/lucene/search/grouping/AllGroupHeadsCollectorTest.java ## @@ -153,23 +187,149 @@ public void testBasic() throws Exception { assertTrue(openBitSetContains(new int[]{1, 5}, allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc)); // STRING sort type triggers different implementation -Sort sortWithinGroup2 = new Sort(new SortField("id_2", SortField.Type.STRING, true)); -allGroupHeadsCollector = createRandomCollector(groupField, sortWithinGroup2); +for (Function sortFunc : new Function[] { + // (r) -> new SortField("id_2", SortField.Type.STRING, (boolean) r), + // (r) -> new SortedSetSortField("id_3", (boolean) r), +(r) -> new SortedSetSortField("id_4", (boolean) r) +}) { + + Sort sortWithinGroup2 = new Sort(sortFunc.apply(true)); + allGroupHeadsCollector = createRandomCollector(groupField, sortWithinGroup2); + indexSearcher.search(new TermQuery(new Term("content", "random")), allGroupHeadsCollector); + assertTrue(arrayContains(new int[] {2, 3, 5, 7}, allGroupHeadsCollector.retrieveGroupHeads())); + assertTrue(openBitSetContains(new int[] {2, 3, 5, 7}, allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc)); + + Sort sortWithinGroup3 = new Sort(sortFunc.apply(false)); + allGroupHeadsCollector = createRandomCollector(groupField, sortWithinGroup3); + indexSearcher.search(new TermQuery(new Term("content", "random")), allGroupHeadsCollector); + // 7 b/c higher doc id wins, even if order of field is in not in reverse. + assertTrue(arrayContains(new int[] {0, 3, 4, 6}, allGroupHeadsCollector.retrieveGroupHeads())); + assertTrue(openBitSetContains(new int[] {0, 3, 4, 6}, allGroupHeadsCollector.retrieveGroupHeads(maxDoc), maxDoc)); +} +indexSearcher.getIndexReader().close(); +dir.close(); + } + + public void testBasicBlockJoin() throws Exception { +final String groupField = "author"; +Directory dir = newDirectory(); +RandomIndexWriter w = new RandomIndexWriter( +random(), +dir, +newIndexWriterConfig(new MockAnalyzer(random())).setMergePolicy(newLogMergePolicy())); +DocValuesType valueType = DocValuesType.SORTED; + +// 0 +Document doc = new Document(); +addGroupField(doc, groupField, "author1", valueType); +doc.add(newTextField("content", "random text", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 1)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("1"))); +addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("10")), +new SortedSetDocValuesField("id_3", new BytesRef("11"))); + +// 1 +doc = new Document(); +addGroupField(doc, groupField, "author1", valueType); +doc.add(newTextField("content", "some more random text blob", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 2)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("2"))); +addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("20")), + new SortedSetDocValuesField("id_3", new BytesRef("21"))); + +// 2 +doc = new Document(); +addGroupField(doc, groupField, "author1", valueType); +doc.add(newTextField("content", "some more random textual data", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 3)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("3"))); +addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("30")), + new SortedSetDocValuesField("id_3", new BytesRef("31"))); +w.commit(); // To ensure a second segment + +// 3 +doc = new Document(); +addGroupField(doc, groupField, "author2", valueType); +doc.add(newTextField("content", "some random text", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 4)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("4"))); +addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("40")), + new SortedSetDocValuesField("id_3", new BytesRef("41"))); + +// 4 +doc = new Document(); +addGroupField(doc, groupField, "author3", valueType); +doc.add(newTextField("content", "some more random text", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 5)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("5"))); +addParent(w, doc, new SortedSetDocValuesField("id_3", new BytesRef("50")), + new SortedSetDocValuesField("id_3", new BytesRef("51"))); + +// 5 +doc = new Document(); +addGroupField(doc, groupField, "author3", valueType); +doc.add(newTextField("content", "random blob", Field.Store.NO)); +doc.add(new NumericDocValuesField("id_1", 6)); +doc.add(new SortedDocValuesField("id_2", new BytesRef("6")))
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.
mkhludnev commented on a change in pull request #1462: URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010400 ## File path: lucene/join/src/java/org/apache/lucene/search/join/BlockJoinSelector.java ## @@ -112,9 +112,9 @@ public static SortedDocValues wrap(final SortedDocValues values, Type selection, * one value per parent among its {@code children} using the configured * {@code selection} type. */ public static SortedDocValues wrap(final SortedDocValues values, Type selection, BitSet parents, DocIdSetIterator children) { -if (values.docID() != -1) { - throw new IllegalArgumentException("values iterator was already consumed: values.docID=" + values.docID()); -} +//if (values.docID() != -1) { Review comment: Would be discussed later, if all other issues resolved. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] mkhludnev commented on a change in pull request #1462: LUCENE-9328: open group.sort docvalues once per segment.
mkhludnev commented on a change in pull request #1462: URL: https://github.com/apache/lucene-solr/pull/1462#discussion_r419010654 ## File path: lucene/test-framework/src/java/org/apache/lucene/codecs/asserting/AssertingDocValuesFormat.java ## @@ -285,7 +286,9 @@ public SortedSetDocValues getSortedSet(FieldInfo field) throws IOException { assert field.getDocValuesType() == DocValuesType.SORTED_SET; SortedSetDocValues values = in.getSortedSet(field); assert values != null; - return new AssertingLeafReader.AssertingSortedSetDocValues(values, maxDoc); + final SortedDocValues singleton = DocValues.unwrapSingleton(values); + return singleton==null ? new AssertingLeafReader.AssertingSortedSetDocValues(values, maxDoc) : Review comment: I think it's worth to span this to NumericsSet and also to other usages AssertingDV. Now, these DVs aren't handled by `DocValues.unwrapSingleton()`. @romseygeek , isn't it worth to commit separately? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] MarcusSorealheis commented on pull request #1471: Revised SOLR-14014 PR Against Master
MarcusSorealheis commented on pull request #1471: URL: https://github.com/apache/lucene-solr/pull/1471#issuecomment-623017503 > It would be nice to make the Admin UI Disabled page a little bit prettier if possible, something more akin to the login page, but that might be out of scope of this PR. Feel free to defer that to a follow on if you think it's better to handle separately. Subsequent PR please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9328) SortingGroupHead to reuse DocValues
[ https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098112#comment-17098112 ] Mikhail Khludnev commented on LUCENE-9328: -- Ok. [~romseygeek], thanks for the clue. It turns to be more complex problem. I pushed some broken code to github PR https://github.com/apache/lucene-solr/pull/1462/. Currently I'm stuck at [ToParentDocValues.advanceExact()|https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java#L259] which turns it to strict {{advance()}}. I think it's can be changed to use {{advanceExact()}}. Does it make any sense? > SortingGroupHead to reuse DocValues > --- > > Key: LUCENE-9328 > URL: https://issues.apache.org/jira/browse/LUCENE-9328 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch > > Time Spent: 50m > Remaining Estimate: 0h > > That's why > https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-9328) SortingGroupHead to reuse DocValues
[ https://issues.apache.org/jira/browse/LUCENE-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098112#comment-17098112 ] Mikhail Khludnev edited comment on LUCENE-9328 at 5/2/20, 10:11 PM: Ok. [~romseygeek], thanks for the clue. Sorry for noise. WIP. was (Author: mkhludnev): Ok. [~romseygeek], thanks for the clue. It turns to be more complex problem. I pushed some broken code to github PR https://github.com/apache/lucene-solr/pull/1462/. Currently I'm stuck at [ToParentDocValues.advanceExact()|https://github.com/apache/lucene-solr/blob/master/lucene/join/src/java/org/apache/lucene/search/join/ToParentDocValues.java#L259] which turns it to strict {{advance()}}. I think it's can be changed to use {{advanceExact()}}. Does it make any sense? > SortingGroupHead to reuse DocValues > --- > > Key: LUCENE-9328 > URL: https://issues.apache.org/jira/browse/LUCENE-9328 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/grouping >Reporter: Mikhail Khludnev >Assignee: Mikhail Khludnev >Priority: Minor > Attachments: LUCENE-9328.patch, LUCENE-9328.patch, LUCENE-9328.patch > > Time Spent: 50m > Remaining Estimate: 0h > > That's why > https://issues.apache.org/jira/browse/LUCENE-7701?focusedCommentId=17084365&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17084365 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098173#comment-17098173 ] Tomoko Uchida commented on LUCENE-9321: --- bq. How many of these cross-module links are we talking about? Maybe we can just dump them altogether? I think the "checkJavadocLinks.py" does the work. It collects all {{href}} attributes in given HTML (regardless of they are absolete or relative, or they are external links or cross-module links). Maybe we can dump all links by inserting 'print' and run the script. [https://github.com/apache/lucene-solr/blob/master/dev-tools/scripts/checkJavadocLinks.py#L31] > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov commented on a change in pull request #1473: LUCENE-9353: Move terms metadata to its own file.
msokolov commented on a change in pull request #1473: URL: https://github.com/apache/lucene-solr/pull/1473#discussion_r419039099 ## File path: lucene/core/src/java/org/apache/lucene/codecs/blocktree/BlockTreeTermsReader.java ## @@ -148,56 +155,80 @@ public BlockTreeTermsReader(PostingsReaderBase postingsReader, SegmentReadState CodecUtil.retrieveChecksum(termsIn); // Read per-field details - seekDir(termsIn); - seekDir(indexIn); + String metaName = IndexFileNames.segmentFileName(segment, state.segmentSuffix, TERMS_META_EXTENSION); + Map fieldMap = null; + Throwable priorE = null; + try (ChecksumIndexInput metaIn = version >= VERSION_META_FILE ? state.directory.openChecksumInput(metaName, state.context) : null) { +try { + final IndexInput indexMetaIn, termsMetaIn; + if (version >= VERSION_META_FILE) { +CodecUtil.checkIndexHeader(metaIn, TERMS_META_CODEC_NAME, version, version, state.segmentInfo.getId(), state.segmentSuffix); +indexMetaIn = termsMetaIn = metaIn; + } else { +seekDir(termsIn); +seekDir(indexIn); +indexMetaIn = indexIn; +termsMetaIn = termsIn; + } - final int numFields = termsIn.readVInt(); - if (numFields < 0) { -throw new CorruptIndexException("invalid numFields: " + numFields, termsIn); - } - fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1); - for (int i = 0; i < numFields; ++i) { -final int field = termsIn.readVInt(); -final long numTerms = termsIn.readVLong(); -if (numTerms <= 0) { - throw new CorruptIndexException("Illegal numTerms for field number: " + field, termsIn); -} -final BytesRef rootCode = readBytesRef(termsIn); -final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field); -if (fieldInfo == null) { - throw new CorruptIndexException("invalid field number: " + field, termsIn); -} -final long sumTotalTermFreq = termsIn.readVLong(); -// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only one value is written. -final long sumDocFreq = fieldInfo.getIndexOptions() == IndexOptions.DOCS ? sumTotalTermFreq : termsIn.readVLong(); -final int docCount = termsIn.readVInt(); -if (version < VERSION_META_LONGS_REMOVED) { - final int longsSize = termsIn.readVInt(); - if (longsSize < 0) { -throw new CorruptIndexException("invalid longsSize for field: " + fieldInfo.name + ", longsSize=" + longsSize, termsIn); + final int numFields = termsMetaIn.readVInt(); + if (numFields < 0) { +throw new CorruptIndexException("invalid numFields: " + numFields, termsMetaIn); + } + fieldMap = new HashMap<>((int) (numFields / 0.75f) + 1); + for (int i = 0; i < numFields; ++i) { +final int field = termsMetaIn.readVInt(); +final long numTerms = termsMetaIn.readVLong(); +if (numTerms <= 0) { + throw new CorruptIndexException("Illegal numTerms for field number: " + field, termsMetaIn); +} +final BytesRef rootCode = readBytesRef(termsMetaIn); +final FieldInfo fieldInfo = state.fieldInfos.fieldInfo(field); +if (fieldInfo == null) { + throw new CorruptIndexException("invalid field number: " + field, termsMetaIn); +} +final long sumTotalTermFreq = termsMetaIn.readVLong(); +// when frequencies are omitted, sumDocFreq=sumTotalTermFreq and only one value is written. +final long sumDocFreq = fieldInfo.getIndexOptions() == IndexOptions.DOCS ? sumTotalTermFreq : termsMetaIn.readVLong(); +final int docCount = termsMetaIn.readVInt(); +if (version < VERSION_META_LONGS_REMOVED) { + final int longsSize = termsMetaIn.readVInt(); + if (longsSize < 0) { +throw new CorruptIndexException("invalid longsSize for field: " + fieldInfo.name + ", longsSize=" + longsSize, termsMetaIn); + } +} +BytesRef minTerm = readBytesRef(termsMetaIn); +BytesRef maxTerm = readBytesRef(termsMetaIn); +if (docCount < 0 || docCount > state.segmentInfo.maxDoc()) { // #docs with field must be <= #docs + throw new CorruptIndexException("invalid docCount: " + docCount + " maxDoc: " + state.segmentInfo.maxDoc(), termsMetaIn); Review comment: Thanks, Adrien This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --
[jira] [Commented] (SOLR-13289) Support for BlockMax WAND
[ https://issues.apache.org/jira/browse/SOLR-13289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098238#comment-17098238 ] Kranti Parisa commented on SOLR-13289: -- Tomas et all, thanks for taking this up. This will be super useful in terms of performance gains. How does this work for: - Grouping, Sorting - Custom scoring via value sources - Payloads Is TwoPhaseIterator or custom ranking in collapse/expand mode or in response writer the way? or have to implement a custom query overriding the createWeight using ScoreMode.TOP_SCORES? > Support for BlockMax WAND > - > > Key: SOLR-13289 > URL: https://issues.apache.org/jira/browse/SOLR-13289 > Project: Solr > Issue Type: New Feature >Reporter: Ishan Chattopadhyaya >Assignee: Tomas Eduardo Fernandez Lobbe >Priority: Major > Attachments: SOLR-13289.patch, SOLR-13289.patch > > Time Spent: 2h 40m > Remaining Estimate: 0h > > LUCENE-8135 introduced BlockMax WAND as a major speed improvement. Need to > expose this via Solr. When enabled, the numFound returned will not be exact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14351) Harden MDCLoggingContext.clear depth tracking
[ https://issues.apache.org/jira/browse/SOLR-14351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098250#comment-17098250 ] David Smiley commented on SOLR-14351: - Looks like I goofed in ZkContainer.java in which I removed the check zkController != null. Consequently, in _standalone mode_, you'll see NullPointerException errors (actually benign) with a stack looking like: at org.apache.solr.core.ZkContainer.lambda$registerInZk$1(ZkContainer.java:195) at org.apache.solr.core.ZkContainer.lambda$registerInZk$1(ZkContainer.java:195) at org.apache.solr.core.ZkContainer.registerInZk(ZkContainer.java:224) etc. I'll push a fix tomorrow with some updated javadocs on this ZkContainer class. > Harden MDCLoggingContext.clear depth tracking > - > > Key: SOLR-14351 > URL: https://issues.apache.org/jira/browse/SOLR-14351 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Minor > Fix For: 8.6 > > Time Spent: 1h > Remaining Estimate: 0h > > MDCLoggingContext tracks recursive calls and only clears when the recursion > level is back down to 0. If a caller forgets to register and ends up calling > clear any ways, then this can mess things up. Additionally I found at least > one place this is occurring, which led me to investigate this matter. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9321) Port documentation task to gradle
[ https://issues.apache.org/jira/browse/LUCENE-9321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098254#comment-17098254 ] Tomoko Uchida commented on LUCENE-9321: --- bq. Maybe we can dump all links by inserting 'print' and run the script. I tried to dump all cross-module (relative) links by this patch to chackJavadocLinks.py {code} diff --git a/dev-tools/scripts/checkJavadocLinks.py b/dev-tools/scripts/checkJavadocLinks.py index 5d07e27a588..a96879536c9 100644 --- a/dev-tools/scripts/checkJavadocLinks.py +++ b/dev-tools/scripts/checkJavadocLinks.py @@ -74,6 +74,12 @@ class FindHyperlinks(HTMLParser): elif href is not None: assert name is None href = href.strip() +absolute_url = urlparse.urljoin(self.baseURL, href) +prefix1 = '/'.join(urlparse.urlparse(self.baseURL).path.split('/')[:5]) +prefix2 = '/'.join(urlparse.urlparse(absolute_url).path.split('/')[:5]) +# print only cross-module relative links +if re.match('^../', href) and prefix1 != prefix2: + print('%s\t%s\t%s' % (self.baseURL, href, absolute_url)) self.links.append(urlparse.urljoin(self.baseURL, href)) elif id is None: raise RuntimeError('couldn\'t find an href nor name in link in %s: only got these attrs: %s' % (self.baseURL, attrs)) @@ -130,8 +136,9 @@ def checkAll(dirName): global failures # Find/parse all HTML files first - print() - print('Crawl/parse...') + #print() + #print('Crawl/parse...') + print('filename\trelative path\tabsolute url') allFiles = {} if os.path.isfile(dirName): @@ -160,8 +167,8 @@ def checkAll(dirName): allFiles[fullPath] = parse(fullPath, open('%s/%s' % (root, f), encoding='UTF-8').read()) # ... then verify: - print() - print('Verify...') + #print() + #print('Verify...') for fullPath, (links, anchors) in allFiles.items(): #print fullPath printed = False {code} I don't want to attach the results (as the output file is large), but this can be run as below {code} lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py lucene/build/docs/ > ~/work/lucene-javadocs-relative-paths.tsv lucene-solr $ wc -l ~/work/lucene-javadocs-relative-paths.tsv 31434 /home/moco/work/lucene-javadocs-relative-paths.tsv lucene-solr $ python -B dev-tools/scripts/checkJavadocLinks.py solr/build/docs/ > ~/work/solr-javadocs-relative-paths.tsv lucene-solr $ wc -l ~/work/solr-javadocs-relative-paths.tsv 9307 /home/moco/work/solr-javadocs-relative-paths.tsv {code} This includes both kind of relative paths - automatically generated links by javadoc tool and hand written links by human (I don't know there is a way to distinguish them). With gradle scripts on the current master, the number should be reduced since all automatically generated links are absolute ones with "renderJavadoc" task. > Port documentation task to gradle > - > > Key: LUCENE-9321 > URL: https://issues.apache.org/jira/browse/LUCENE-9321 > Project: Lucene - Core > Issue Type: Sub-task > Components: general/build >Reporter: Tomoko Uchida >Assignee: Tomoko Uchida >Priority: Major > Attachments: screenshot-1.png > > Time Spent: 2.5h > Remaining Estimate: 0h > > This is a placeholder issue for porting ant "documentation" task to gradle. > The generated documents should be able to be published on lucene.apache.org > web site on "as-is" basis. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org