date:20200924

[GitHub] [lucene-solr] s1monw commented on pull request #1918: LUCENE-9535: Commit DWPT bytes used before locking indexing

2020-09-24 Thread GitBox



s1monw commented on pull request #1918:
URL: https://github.com/apache/lucene-solr/pull/1918#issuecomment-698163487


   @jpountz I had to change some stuff to make it work. Down the road I want to 
clean this up more so we don't need the extra step but I want to do this after 
we cut 8.7



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201326#comment-17201326
 ] 

ASF subversion and git services commented on SOLR-14613:


Commit cafa449769fb131830ede129910287670625aa0d in lucene-solr's branch 
refs/heads/master from noblepaul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=cafa449 ]

SOLR-14613: Avoid multiple ZK write


> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
>  Time Spent: 41h
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201335#comment-17201335
 ] 

ASF subversion and git services commented on LUCENE-9535:
-

Commit c258905bd01f458df4924e361b2395f06e387b88 in lucene-solr's branch 
refs/heads/master from Simon Willnauer
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=c258905 ]

LUCENE-9535: Commit DWPT bytes used before locking indexing (#1918)

Currently we calculate the ramBytesUsed by the DWPT under the flushControl
lock. We can do this calculation safely outside of the lock without any 
downside.
The FlushControl lock should be used with care since it's a central part of 
indexing
and might block all indexing.

> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] s1monw merged pull request #1918: LUCENE-9535: Commit DWPT bytes used before locking indexing

2020-09-24 Thread GitBox



s1monw merged pull request #1918:
URL: https://github.com/apache/lucene-solr/pull/1918


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul commented on a change in pull request #1906: SOLR-13528: Implement API Based Config For Rate Limiters

2020-09-24 Thread GitBox



noblepaul commented on a change in pull request #1906:
URL: https://github.com/apache/lucene-solr/pull/1906#discussion_r494114574



##
File path: solr/core/src/java/org/apache/solr/handler/ClusterAPI.java
##
@@ -206,7 +209,7 @@ public void setObjProperty(PayloadObj obj) 
{
 public void setProperty(PayloadObj> obj) throws 
Exception {
   Map m =  obj.getDataMap();
   m.put("action", CLUSTERPROP.toString());
-  collectionsHandler.handleRequestBody(wrapParams(obj.getRequest(),m ), 
obj.getResponse());
+  collectionsHandler.handleRequestBody(wrapParams(obj.getRequest(), m), 
obj.getResponse());

Review comment:
   why is there a change here?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1905: LUCENE-9488 Release with Gradle Part 2

2020-09-24 Thread GitBox



dweiss commented on a change in pull request #1905:
URL: https://github.com/apache/lucene-solr/pull/1905#discussion_r494116068



##
File path: lucene/build.gradle
##
@@ -15,8 +15,56 @@
  * limitations under the License.
  */
 
+// Should we do this as :lucene:packaging similar to how Solr does it?
+// Or is this fine here?
+
+plugins {
+  id 'distribution'
+}
+
 description = 'Parent project for Apache Lucene Core'
 
 subprojects {
   group "org.apache.lucene"
-}
\ No newline at end of file
+}
+
+distributions {
+  main {
+  // This is empirically wrong, but it is mostly a copy from `ant 
package-zip`

Review comment:
   Ok, fair enough. But I wouldn't want to add things to gradle that are 
not right - these are hard to get rid of later on. I'm really busy this week 
but I will correct those assembly bits and commit them to this PR. Please give 
me some time to work on this, thank you.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201355#comment-17201355
 ] 

Dawid Weiss commented on SOLR-14889:


Hi Chris. Uwe explained the particular problem, I wanted to follow up with a 
generic explanation.

The order of evaluation is I think the most tricky bit in gradle builds as it 
involves syntactic sugar of groovy, hooks in gradle itself and the different 
"stages" of gradle build itself. This is a good read:

https://docs.gradle.org/current/userguide/build_lifecycle.html

In short terms the three phases are: initialization - boostrap, reading the 
setup properties, etc (we don't care much), evaluation - this is actually 
execution too - all build scripts (groovy) are run against an empty build 
graph; they add and configure tasks, prepare properties, declare dependencies 
(but shouldn't resolve them yet), etc. This is when you "set up" the build. The 
execution phase follows when gradle decides which tasks to actually execute 
(based on user-provided task names, dependencies, configurations to resolve and 
cache states). The "doFirst" and "doLast" clauses in tasks add an anonymous 
closure to the list of things to run in execution phase. So even if a script 
looks linear, the closure in doFirst and doLast runs at a completely different 
time than what's outside of it.

The simplest way to see what gets executed and when is to add debug printlns... 
It really helps sometimes.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-24 Thread GitBox



dweiss commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-698189266


   If there are test infrastructure that needs to be shared then I'd suggest 
creating a project that is a test-configuration dependency from other 
subprojects (rather than cloning those classes). This is a simple and cheap 
thing to do with gradle.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201360#comment-17201360
 ] 

Uwe Schindler commented on SOLR-14889:
--

Hi [~dweiss]: Do you think my patch is fine. It works for me, although I don't 
like to pass an empty map to the expand() configuration of the SyncTask.
I was looking for a method like expand() that takes a closure to expand 
properties, but the only one provided by the sync task is the one taking a Map. 
So I don't see any better alternative!

I was also looking into using the project.provider(...) but that is also not 
accepted by expand().

IMHO, I would remove the extra task "populateLazyProps" and move this into the 
copy task (at the place where I added the doFirst()). Then it works linear 
(because doFirst() is run explicitely before the main task method of SyncTask).

The lazy props have no effect on inputs, so theres no need to have it separated 
(inputs are also executed before). When it depends on the configuration, it's 
reexecuted anyways.

I can provide a patch simplifying this.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201364#comment-17201364
 ] 

Dawid Weiss commented on SOLR-14889:


Darn, I didn't even look at your patch, Uwe - assumed you've solved it. :) I 
really can't do anything in the next ~7 hours or so. Will try to take a look 
after that though. 

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #1758: SOLR-14749: Provide a clean API for cluster-level event processing, Initial draft.

2020-09-24 Thread GitBox



sigram commented on a change in pull request #1758:
URL: https://github.com/apache/lucene-solr/pull/1758#discussion_r494131970



##
File path: solr/core/src/java/org/apache/solr/core/CoreContainer.java
##
@@ -889,7 +896,37 @@ public void load() {
   ContainerPluginsApi containerPluginsApi = new ContainerPluginsApi(this);
   
containerHandlers.getApiBag().registerObject(containerPluginsApi.readAPI);
   
containerHandlers.getApiBag().registerObject(containerPluginsApi.editAPI);
+
+  // create the ClusterEventProducer
+  CustomContainerPlugins.ApiInfo clusterEventProducerInfo = 
customContainerPlugins.getPlugin(ClusterEventProducer.PLUGIN_NAME);
+  if (clusterEventProducerInfo != null) {
+clusterEventProducer = (ClusterEventProducer) 
clusterEventProducerInfo.getInstance();
+  } else {
+clusterEventProducer = new ClusterEventProducerImpl(this);
+  }
+  // init ClusterSingleton-s
+  Map singletons = new ConcurrentHashMap<>();
+  if (clusterEventProducer instanceof ClusterSingleton) {
+singletons.put(ClusterEventProducer.PLUGIN_NAME, (ClusterSingleton) 
clusterEventProducer);
+  }
+
+  // register ClusterSingleton handlers
+  // XXX register also other ClusterSingleton-s from packages - how?
+  containerHandlers.keySet().forEach(handlerName -> {

Review comment:
   The purpose of this code is to build a registry of existing 
`ClusterSingleton` implementations (perhaps this should go to a dedicated 
registry class). We don't have a dependency injection framework, so we need to 
somewhere perform the discovery and registration ourselves.
   And we need a registry in order to manage the `ClusterSingleton` lifecycle 
together with the Overseer leader life-cycle.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on a change in pull request #1758: SOLR-14749: Provide a clean API for cluster-level event processing, Initial draft.

2020-09-24 Thread GitBox



sigram commented on a change in pull request #1758:
URL: https://github.com/apache/lucene-solr/pull/1758#discussion_r494134314



##
File path: 
solr/core/src/java/org/apache/solr/handler/admin/ContainerPluginsApi.java
##
@@ -64,15 +64,15 @@ public ContainerPluginsApi(CoreContainer coreContainer) {
 
   public class Read {
 @EndPoint(method = METHOD.GET,
-path = "/cluster/plugin",
+path = "/cluster/plugins",

Review comment:
   Right, but this is for 9.0 so we can break back-compat if it's justified 
- and I think it is because the singular name here doesn't make sense, as it is 
a location where multiple plugin configurations are defined.
   
   In any case we can provide a back-compat shim for 9.0 to also accept 
`plugin` singular.

##
File path: 
solr/core/src/java/org/apache/solr/handler/admin/ContainerPluginsApi.java
##
@@ -64,15 +64,15 @@ public ContainerPluginsApi(CoreContainer coreContainer) {
 
   public class Read {
 @EndPoint(method = METHOD.GET,
-path = "/cluster/plugin",
+path = "/cluster/plugins",
 permission = PermissionNameProvider.Name.COLL_READ_PERM)
 public void list(SolrQueryRequest req, SolrQueryResponse rsp) throws 
IOException {
-  rsp.add(PLUGIN, plugins(zkClientSupplier));
+  rsp.add(PLUGINS, plugins(zkClientSupplier));

Review comment:
   See above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Uwe Schindler (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-14889:
-
Attachment: SOLR-14889.patch

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] sigram commented on pull request #1758: SOLR-14749: Provide a clean API for cluster-level event processing, Initial draft.

2020-09-24 Thread GitBox



sigram commented on pull request #1758:
URL: https://github.com/apache/lucene-solr/pull/1758#issuecomment-698204377


   > an example of how you register some type of plugin
   
   Still open for suggestions. IMHO `ClusterEventListener`-s make sense only if 
they are also `ClusterSingleton`-s. If that's the case then there's that messy 
section in `CoreContainer` that already registers `ClusterSingleton`-s, and we 
can add a section that additionally registers any instance that implements 
`ClusterEventListener` with the `ClusterEventProducer`.
   
   > how it looks in some JSON in ZK
   
   I'm reusing the plugin-s configs from `CustomContainerPlugins` so it will 
look like any other plugin config.
   
   > If a plugin can be registered using a public API, is there a testcase for 
the same?
   
   Not yet :)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201368#comment-17201368
 ] 

Uwe Schindler commented on SOLR-14889:
--

Hi [~dweiss], hi [~hossman],

I updated the patch, here's my new one: [^SOLR-14889.patch] 

I removed the prepareLazyProps task and moved the stuff into doFirst. This has 
several goodies:
- It's an antipattern to modify global properties during a task execution, 
because depending on order of tasks (or parallelism) this may lead to strange 
results. Because of this the doFirst clones the input map and then starts to 
add stuff
- Uptodate now works as expected, because the properties as input don't 
suddenly change. To me the task was suddenly reexceuted, although running it 
several times. Now this is solved. prepareSources only runs if input files 
change (sync task), or the modifiable templateProps input changes. It is also 
reexcuted, if the configuration changes, so I also added "dependsOn 
configurations.depVer"
- The properties are printed out
- After printing them out they are assigned to a FINAL local variable, which is 
passed to expand. It's declared final to make sure the map  reference stays the 
same.

[~hossman]: you can now change the nocommits back to their original state, 
looks like it works as expected. I am happy with it now.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201368#comment-17201368
 ] 

Uwe Schindler edited comment on SOLR-14889 at 9/24/20, 8:43 AM:


Hi [~dweiss], hi [~hossman],

I updated the patch, here's my new one: [^SOLR-14889.patch] 

I removed the prepareLazyProps task and moved the stuff into doFirst. This has 
several goodies:
- It's an antipattern to modify global properties during a task execution, 
because depending on order of tasks (or parallelism) this may lead to strange 
results. Because of this the doFirst clones the input map and then starts to 
add stuff
- Uptodate now works as expected, because the properties as input don't 
suddenly change. To me the task was suddenly reexceuted, although running it 
several times. Now this is solved. prepareSources only runs if input files 
change (sync task), or the modifiable templateProps input changes. It is also 
reexcuted, if the configuration changes, so I also added "dependsOn 
configurations.depVer"
- The properties are printed out
- After printing them out they are added to the previously (in configuration 
phase) FINAL local Map variable, which is passed to expand. It's declared final 
to make sure the map  reference stays the same, so expand() always sees only 
one instance.

[~hossman]: you can now change the nocommits back to their original state, 
looks like it works as expected. I am happy with it now.


was (Author: thetaphi):
Hi [~dweiss], hi [~hossman],

I updated the patch, here's my new one: [^SOLR-14889.patch] 

I removed the prepareLazyProps task and moved the stuff into doFirst. This has 
several goodies:
- It's an antipattern to modify global properties during a task execution, 
because depending on order of tasks (or parallelism) this may lead to strange 
results. Because of this the doFirst clones the input map and then starts to 
add stuff
- Uptodate now works as expected, because the properties as input don't 
suddenly change. To me the task was suddenly reexceuted, although running it 
several times. Now this is solved. prepareSources only runs if input files 
change (sync task), or the modifiable templateProps input changes. It is also 
reexcuted, if the configuration changes, so I also added "dependsOn 
configurations.depVer"
- The properties are printed out
- After printing them out they are assigned to a FINAL local variable, which is 
passed to expand. It's declared final to make sure the map  reference stays the 
same.

[~hossman]: you can now change the nocommits back to their original state, 
looks like it works as expected. I am happy with it now.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201372#comment-17201372
 ] 

Dawid Weiss commented on SOLR-14889:


I glanced at it and I think it's good, will verify in the afternoon.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-09-24 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201378#comment-17201378
 ] 

Ilan Ginzburg commented on SOLR-14613:
--

[~noble.paul] could we change {{setClusterProperty(String propertyName, Object 
propertyValue)}} in {{ClusterProperties}} to replace the existing property 
value with the passed one, or do we really need this "partial" update behavior? 
(where existing Json in {{clusterprops.json}} for {{propertyName}} is merged 
rather than replaced with the new Json of {{propertyValue}})

If we can't change the existing method, I suggest the added one to have a 
consistent signature, i.e. define *{{update(String propertyName, Object 
propertyValue)}}* rather than {{update(MapWriter obj, String... path)}}. This 
will make {{ClusterAPI}} easier to read and move the configuration management 
implementation details to {{ClusterProperties}} where they belong.

> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
>  Time Spent: 41h
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14843) Define strongly-typed cluster configuration API

2020-09-24 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201427#comment-17201427
 ] 

Ilan Ginzburg commented on SOLR-14843:
--

Would an initial step in this Jira be the ability to define a way to "prime" 
the Zookeeper based configuration with something coming from a file?

Motivation:

The plugin based replica placement code is in master, but the default placement 
strategy is {{LEGACY}}. To use the new placement strategy a specific config 
must be set on a running SolrCloud cluster using a {{curl}} command to the 
config API.

To make plugin placement the default strategy now or later (now would be good 
so it gets some baking time...) yet be able to switch back a cluster to 
{{LEGACY}} if needed (by removing that configuration), a configuration needs to 
somehow be automatically pushed to {{/clusterprops.json}}, but there's no 
support for doing that.

I believe priming a ZK config with content from a file is not easy (handle new 
configs or changes to default configs coming with a new release, deal with 
existing configuration in ZK to not overwrite it etc.) and is not my preferred 
way of dealing with this types of configs, but we do need something.

 

> Define strongly-typed cluster configuration API
> ---
>
> Key: SOLR-14843
> URL: https://issues.apache.org/jira/browse/SOLR-14843
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>
> Current cluster-level configuration uses a hodgepodge of traditional Solr 
> config sources (solr.xml, system properties) and the new somewhat arbitrary 
> config files kept in ZK ({{/clusterprops.json, /security.json, 
> /packages.json, /autoscaling.json}} etc...). There's no uniform 
> strongly-typed API to access and manage these configs - currently each config 
> source has its own CRUD, often relying on direct access to Zookeeper. There's 
> also no uniform method for monitoring changes to these config sources.
> This issue proposes a uniform config API facade with the following 
> characteristics:
>  * Using a single hierarchical (or at least key-based) facade for accessing 
> any global config.
>  * Using strongly-typed sub-system configs instead of opaque Map-s: 
> components would no longer deal with JSON parsing/writing, instead they would 
> use properly annotated Java objects for config CRUD. Config objects would 
> include versioning information (eg. lastModified timestamp).
>  * Isolating access to the underlying config persistence layer: components 
> would no longer directly interact with Zookeeper or files. Most likely the 
> default implementation would continue using different ZK files per-subsystem 
> in order to limit the complexity of file formats and to reduce the cost of 
> notifications for unmodified parts of the configs.
>  * Providing uniform way to register listeners for monitoring changes in 
> specific configs: components would no longer need to interact with ZK 
> watches, they would instead be notified about modified configs that they are 
> interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14354) HttpShardHandler send requests in async

2020-09-24 Thread Cao Manh Dat (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201437#comment-17201437
 ] 

Cao Manh Dat commented on SOLR-14354:
-

[~ichattopadhyaya] sorry but I gave solr-bench a shot and it seems with the 
default config-local.json finish too quick – even when I tried to increase the 
size of queryLog 56k queries, but the total time seems unchanged. With the 
total time of 500ms for query benchmark, the result can't say anything at all. 

Again I'm ok with reverting, but I kinda sad when good things unable to reach 
users soon. 

[~sarkaramr...@gmail.com] I heard that you have the toolbox for doing benchmark 
in a heavy scenario. If possible can you do that and post the result here? It 
will be a big help for us and our users.

> HttpShardHandler send requests in async
> ---
>
> Key: SOLR-14354
> URL: https://issues.apache.org/jira/browse/SOLR-14354
> Project: Solr
>  Issue Type: Improvement
>Reporter: Cao Manh Dat
>Assignee: Cao Manh Dat
>Priority: Blocker
> Fix For: master (9.0), 8.7
>
> Attachments: image-2020-03-23-10-04-08-399.png, 
> image-2020-03-23-10-09-10-221.png, image-2020-03-23-10-12-00-661.png
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h2. 1. Current approach (problem) of Solr
> Below is the diagram describe the model on how currently handling a request.
> !image-2020-03-23-10-04-08-399.png!
> The main-thread that handles the search requests, will submit n requests (n 
> equals to number of shards) to an executor. So each request will correspond 
> to a thread, after sending a request that thread basically do nothing just 
> waiting for response from other side. That thread will be swapped out and CPU 
> will try to handle another thread (this is called context switch, CPU will 
> save the context of the current thread and switch to another one). When some 
> data (not all) come back, that thread will be called to parsing these data, 
> then it will wait until more data come back. So there will be lots of context 
> switching in CPU. That is quite inefficient on using threads.Basically we 
> want less threads and most of them must busy all the time, because threads 
> are not free as well as context switching. That is the main idea behind 
> everything, like executor
> h2. 2. Async call of Jetty HttpClient
> Jetty HttpClient offers async API like this.
> {code:java}
> httpClient.newRequest("http://domain.com/path";)
> // Add request hooks
> .onRequestQueued(request -> { ... })
> .onRequestBegin(request -> { ... })
> // Add response hooks
> .onResponseBegin(response -> { ... })
> .onResponseHeaders(response -> { ... })
> .onResponseContent((response, buffer) -> { ... })
> .send(result -> { ... }); {code}
> Therefore after calling {{send()}} the thread will return immediately without 
> any block. Then when the client received the header from other side, it will 
> call {{onHeaders()}} listeners. When the client received some {{byte[]}} (not 
> all response) from the data it will call {{onContent(buffer)}} listeners. 
> When everything finished it will call {{onComplete}} listeners. One main 
> thing that will must notice here is all listeners should finish quick, if the 
> listener block, all further data of that request won’t be handled until the 
> listener finish.
> h2. 3. Solution 1: Sending requests async but spin one thread per response
>  Jetty HttpClient already provides several listeners, one of them is 
> InputStreamResponseListener. This is how it is get used
> {code:java}
> InputStreamResponseListener listener = new InputStreamResponseListener();
> client.newRequest(...).send(listener);
> // Wait for the response headers to arrive
> Response response = listener.get(5, TimeUnit.SECONDS);
> if (response.getStatus() == 200) {
>   // Obtain the input stream on the response content
>   try (InputStream input = listener.getInputStream()) {
> // Read the response content
>   }
> } {code}
> In this case, there will be 2 thread
>  * one thread trying to read the response content from InputStream
>  * one thread (this is a short-live task) feeding content to above 
> InputStream whenever some byte[] is available. Note that if this thread 
> unable to feed data into InputStream, this thread will wait.
> By using this one, the model of HttpShardHandler can be written into 
> something like this
> {code:java}
> handler.sendReq(req, (is) -> {
>   executor.submit(() ->
> try (is) {
>   // Read the content from InputStream
> }
>   )
> }) {code}
>  The first diagram will be changed into this
> !image-2020-03-23-10-09-10-221.png!
> Notice that although “sending req to shard1” is wide, it won’t take long time 
> since sending req is a very quic

[GitHub] [lucene-solr] mocobeta commented on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-24 Thread GitBox



mocobeta commented on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-698267536


   > If you have any questions, please ask me on slack, there I can respond 
faster.
   
   We already have Jira and Github, I'd rather not disperse discussions onto 
multiple platforms any more...
   Is Jira/Github mentions insufficient for us?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mocobeta edited a comment on pull request #1836: LUCENE-9317: Clean up split package in analyzers-common

2020-09-24 Thread GitBox



mocobeta edited a comment on pull request #1836:
URL: https://github.com/apache/lucene-solr/pull/1836#issuecomment-698267536


   > If you have any questions, please ask me on slack, there I can respond 
faster.
   
   We already have Jira and Github, I'd rather not disperse discussions onto 
multiple platforms any more...
   Is Jira/Github mention insufficient for us?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on pull request #1912: LUCENE-9535: Try to do larger flushes.

2020-09-24 Thread GitBox



mikemccand commented on pull request #1912:
URL: https://github.com/apache/lucene-solr/pull/1912#issuecomment-698332632


   Oh my!
   
   We are talking about assigning DWPT to incoming indexing thread, right?  
(And not which DWPT to pick for flushing because RAM buffer is full, which I 
think is currently "the biggest one/s"?).
   
   I think this is a good idea.  It will tend to make the biggest DWPTs even 
bigger, especially when there is high variance of how many threads are indexing 
at once over time, until a flush is triggered.  I do not remember why we 
switched to "last DWPT".  Long ago we did have thread affinity, so the same 
indexing thread would try to get the same DWPT, but we moved away from that 
quite a while ago.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-24 Thread GitBox



dweiss commented on a change in pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917#discussion_r494311008



##
File path: 
lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java
##
@@ -400,8 +400,13 @@ public void writeSetOfStrings(Set set) {
   public long ramBytesUsed() {
 // Return a rough estimation for allocated blocks. Note that we do not make
 // any special distinction for direct memory buffers.
-return RamUsageEstimator.NUM_BYTES_OBJECT_REF * blocks.size() + 
-   blocks.stream().mapToLong(buf -> buf.capacity()).sum();
+ByteBuffer first = blocks.peek();
+if (first == null) {
+  return 0L;
+} else {
+  // All blocks have the same capacity.

Review comment:
   Hmmm... Do they? I don't think this is the case, in general, since you 
can plug in an arbitrary block provider/ recycler?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-24 Thread GitBox



dweiss commented on a change in pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917#discussion_r494314029



##
File path: 
lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java
##
@@ -400,8 +400,13 @@ public void writeSetOfStrings(Set set) {
   public long ramBytesUsed() {
 // Return a rough estimation for allocated blocks. Note that we do not make
 // any special distinction for direct memory buffers.
-return RamUsageEstimator.NUM_BYTES_OBJECT_REF * blocks.size() + 
-   blocks.stream().mapToLong(buf -> buf.capacity()).sum();
+ByteBuffer first = blocks.peek();
+if (first == null) {
+  return 0L;
+} else {
+  // All blocks have the same capacity.

Review comment:
   I'm not sure I like this assumption (that all blocks are equal). Maybe 
we could maintain a separate ram usage on block addition/ removal and thus make 
the sum-time constant but also accurate?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-24 Thread GitBox



jpountz commented on a change in pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917#discussion_r494317183



##
File path: 
lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java
##
@@ -400,8 +400,13 @@ public void writeSetOfStrings(Set set) {
   public long ramBytesUsed() {
 // Return a rough estimation for allocated blocks. Note that we do not make
 // any special distinction for direct memory buffers.
-return RamUsageEstimator.NUM_BYTES_OBJECT_REF * blocks.size() + 
-   blocks.stream().mapToLong(buf -> buf.capacity()).sum();
+ByteBuffer first = blocks.peek();
+if (first == null) {
+  return 0L;
+} else {
+  // All blocks have the same capacity.

Review comment:
   Oops thanks for catching. I thought they did because we make the 
assumption that all buffers have the same length in toDataInput, which is 
different from capacity.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on a change in pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-24 Thread GitBox



jpountz commented on a change in pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917#discussion_r494317441



##
File path: 
lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java
##
@@ -400,8 +400,13 @@ public void writeSetOfStrings(Set set) {
   public long ramBytesUsed() {
 // Return a rough estimation for allocated blocks. Note that we do not make
 // any special distinction for direct memory buffers.
-return RamUsageEstimator.NUM_BYTES_OBJECT_REF * blocks.size() + 
-   blocks.stream().mapToLong(buf -> buf.capacity()).sum();
+ByteBuffer first = blocks.peek();
+if (first == null) {
+  return 0L;
+} else {
+  // All blocks have the same capacity.

Review comment:
   Yep, I'll do that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning

2020-09-24 Thread Cassandra Targett (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201531#comment-17201531
 ] 

Cassandra Targett commented on SOLR-14891:
--

I think this is a duplicate of SOLR-14835.

> Upgrade Jetty to 9.4.28+ to fix Startup Warning
> ---
>
> Key: SOLR-14891
> URL: https://issues.apache.org/jira/browse/SOLR-14891
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.2
>Reporter: Bernd Wahlen
>Priority: Minor
>
> Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
> I think it is fixed in 9.4.28
> https://github.com/eclipse/jetty.project/issues/4631
> 2020-09-23 09:57:57.346 WARN  (main) [   ] o.e.j.x.XmlConfiguration Ignored 
> arg: 
>  class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry">
>  class="com.codahale.metrics.SharedMetricRegistries">solr.jetty
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning

2020-09-24 Thread Bernd Wahlen (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bernd Wahlen resolved SOLR-14891.
-
Resolution: Duplicate

> Upgrade Jetty to 9.4.28+ to fix Startup Warning
> ---
>
> Key: SOLR-14891
> URL: https://issues.apache.org/jira/browse/SOLR-14891
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.2
>Reporter: Bernd Wahlen
>Priority: Minor
>
> Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
> I think it is fixed in 9.4.28
> https://github.com/eclipse/jetty.project/issues/4631
> 2020-09-23 09:57:57.346 WARN  (main) [   ] o.e.j.x.XmlConfiguration Ignored 
> arg: 
>  class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry">
>  class="com.codahale.metrics.SharedMetricRegistries">solr.jetty
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14835) Solr 8.6.x log starts with "XmlConfiguration Ignored arg" warning from Jetty

2020-09-24 Thread Bernd Wahlen (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201533#comment-17201533
 ] 

Bernd Wahlen commented on SOLR-14835:
-

Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
I think it is fixed in 9.4.28
https://github.com/eclipse/jetty.project/issues/4631

> Solr 8.6.x log starts with "XmlConfiguration Ignored arg" warning from Jetty
> 
>
> Key: SOLR-14835
> URL: https://issues.apache.org/jira/browse/SOLR-14835
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.2
>Reporter: Colvin Cowie
>Assignee: Andrzej Bialecki
>Priority: Trivial
>
> After moving to 8.6.2 the first lines of the solr.log are
> {noformat}
> 2020-09-06 18:19:09.164 INFO  (main) [   ] o.e.j.u.log Logging initialized 
> @1197ms to org.eclipse.jetty.util.log.Slf4jLog
> 2020-09-06 18:19:09.226 WARN  (main) [   ] o.e.j.u.l.o.e.j.x.XmlConfiguration 
> Ignored arg: 
>  class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry">
>  class="com.codahale.metrics.SharedMetricRegistries">solr.jetty
>   
>   
> {noformat}
> This config is declared here: 
> https://github.com/apache/lucene-solr/blob/5154b6008f54c9d096f5efe9ae347492c23dd780/solr/server/etc/jetty.xml#L33
>  and has been there for a long time, so I assume it's the bump in Jetty 
> version that's causing it now.
> I'm seeing this in 8.6.2, but I've not gone back to check other versions



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14891) Upgrade Jetty to 9.4.28+ to fix Startup Warning

2020-09-24 Thread Bernd Wahlen (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201535#comment-17201535
 ] 

Bernd Wahlen commented on SOLR-14891:
-

thanks => close/duplicate

> Upgrade Jetty to 9.4.28+ to fix Startup Warning
> ---
>
> Key: SOLR-14891
> URL: https://issues.apache.org/jira/browse/SOLR-14891
> Project: Solr
>  Issue Type: Wish
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.2
>Reporter: Bernd Wahlen
>Priority: Minor
>
> Solr currently using Jetty 9.4.27 which displays strange Warning at startup.
> I think it is fixed in 9.4.28
> https://github.com/eclipse/jetty.project/issues/4631
> 2020-09-23 09:57:57.346 WARN  (main) [   ] o.e.j.x.XmlConfiguration Ignored 
> arg: 
>  class="com.codahale.metrics.jetty9.InstrumentedQueuedThreadPool"> name="registry">
>  class="com.codahale.metrics.SharedMetricRegistries">solr.jetty
>   
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201550#comment-17201550
 ] 

Dawid Weiss commented on SOLR-14889:


Yep, Uwe's version looks good to me. Added a few additional unrelated cleanups 
(imports, comments, warning from jekyll). Seems good to go.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss updated SOLR-14889:
---
Attachment: SOLR-14889.patch

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #1917: LUCENE-9535: Make ByteBuffersDataOutput#ramBytesUsed run in constant-time.

2020-09-24 Thread GitBox



dweiss commented on a change in pull request #1917:
URL: https://github.com/apache/lucene-solr/pull/1917#discussion_r494352956



##
File path: 
lucene/core/src/java/org/apache/lucene/store/ByteBuffersDataOutput.java
##
@@ -400,8 +400,13 @@ public void writeSetOfStrings(Set set) {
   public long ramBytesUsed() {
 // Return a rough estimation for allocated blocks. Note that we do not make
 // any special distinction for direct memory buffers.
-return RamUsageEstimator.NUM_BYTES_OBJECT_REF * blocks.size() + 
-   blocks.stream().mapToLong(buf -> buf.capacity()).sum();
+ByteBuffer first = blocks.peek();
+if (first == null) {
+  return 0L;
+} else {
+  // All blocks have the same capacity.

Review comment:
   Thanks Adrien!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14549) Listing of Files in a Directory on Solr Admin is Broken

2020-09-24 Thread David Eric Pugh (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201565#comment-17201565
 ] 

David Eric Pugh commented on SOLR-14549:


Happy to QA your changes [~krisden], it definitely stumped me.

> Listing of Files in a Directory on Solr Admin is Broken
> ---
>
> Key: SOLR-14549
> URL: https://issues.apache.org/jira/browse/SOLR-14549
> Project: Solr
>  Issue Type: Bug
>  Components: Admin UI
>Affects Versions: master (9.0), 8.5.1, 8.5.2
>Reporter: David Eric Pugh
>Assignee: Kevin Risden
>Priority: Major
> Attachments: Screenshot at Jun 09 07-40-06.png
>
>
> The Admin interface for showing files only lets you see the top level files, 
> no nested files in a directory:
> http://localhost:8983/solr/#/gettingstarted/files?file=lang%2F
> Choosing a nested directory doesn't generate any console errors, but the tree 
> doesn't open.
> I believe this was introduced during SOLR-14209 upgrade in Jquery.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz opened a new pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-24 Thread GitBox



jpountz opened a new pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919


   This helps remove the assumption that all blocks have the same size.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] jpountz commented on pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-24 Thread GitBox



jpountz commented on pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919#issuecomment-698392573


   @dweiss FYI I could not find a way to have blocks of different capacities as 
we have an assertion that the allocator creates blocks of the expected 
capacity, not larger.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14888) Echo directory to run Solr from for the "assemble" and "dev" targets in the Gradle build

2020-09-24 Thread Erick Erickson (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201574#comment-17201574
 ] 

Erick Erickson commented on SOLR-14888:
---

One other thing that'd be good to do at the same time. If you execute "gradlew 
tasks", this line comes out:

{code}
dev - Assemble Solr distribution into 'development' folder at 
/Users/Erick/apache/solrJiras/master/solr/packaging/build/dev
{code}

It'd be helpful for that same "folder at..." to come out for the assemble 
target too.

> Echo directory to run Solr from for the "assemble" and "dev" targets in the 
> Gradle build
> 
>
> Key: SOLR-14888
> URL: https://issues.apache.org/jira/browse/SOLR-14888
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Erick Erickson
>Priority: Major
>
> This used to happen. As per [~mdrob] opening a JIRA.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



mikemccand commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494395610



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacetLabels.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy;
+
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.util.IntsRef;
+
+import java.io.IOException;
+
+import static org.apache.lucene.facet.taxonomy.TaxonomyReader.INVALID_ORDINAL;
+import static org.apache.lucene.facet.taxonomy.TaxonomyReader.ROOT_ORDINAL;
+
+/**
+ * Utility class to easily retrieve previously indexed facet labels, allowing 
you to skip also adding stored fields for these values,
+ * reducing your index size.
+ *
+ * @lucene.experimental
+ **/
+public class TaxonomyFacetLabels {
+
+  /**
+   * Index field name provided to the constructor
+   */
+  private final String indexFieldName;
+
+  /**
+   * {@code TaxonomyReader} provided to the constructor
+   */
+  private final TaxonomyReader taxoReader;
+
+  /**
+   * {@code FacetsConfig} provided to the constructor
+   */
+  private final FacetsConfig config;
+
+  /**
+   * {@code OrdinalsReader} to decode ordinals previously indexed into the 
{@code BinaryDocValues} facet field
+   */
+  private final OrdinalsReader ordsReader;
+
+  /**
+   * Sole constructor.  Do not close the provided {@link TaxonomyReader} while 
still using this instance!
+   */
+  public TaxonomyFacetLabels(TaxonomyReader taxoReader, FacetsConfig config, 
String indexFieldName) throws IOException {
+this.taxoReader = taxoReader;
+this.config = config;
+this.indexFieldName = indexFieldName;
+this.ordsReader = new DocValuesOrdinalsReader(indexFieldName);
+  }
+
+  /**
+   * Create and return an instance of {@link FacetLabelReader} to retrieve 
facet labels for
+   * multiple documents and (optionally) for a specific dimension.  You must 
create this per-segment,
+   * and then step through all hits, in order, for that segment.
+   *
+   * NOTE: This class is not thread-safe, so you must use a new 
instance of this
+   * class for each thread.
+   *
+   * @param readerContext LeafReaderContext used to access the {@code 
BinaryDocValues} facet field
+   * @return an instance of {@link FacetLabelReader}
+   * @throws IOException when a low-level IO issue occurs
+   */
+  public FacetLabelReader getFacetLabelReader(LeafReaderContext readerContext) 
throws IOException {
+return new FacetLabelReader(ordsReader, readerContext);
+  }
+
+  /**
+   * Utility class to retrieve facet labels for multiple documents.
+   *
+   * @lucene.experimental
+   */
+  public class FacetLabelReader {
+private final OrdinalsReader.OrdinalsSegmentReader ordinalsSegmentReader;
+private final IntsRef decodedOrds = new IntsRef();
+private int currentDocId = -1;
+private int currentPos = -1;
+
+// Lazily set when nextFacetLabel(int docId, String facetDimension) is 
first called
+private int[] parents;
+
+/**
+ * Sole constructor.
+ */
+public FacetLabelReader(OrdinalsReader ordsReader, LeafReaderContext 
readerContext) throws IOException {
+  ordinalsSegmentReader = ordsReader.getReader(readerContext);
+}
+
+/**
+ * Retrieves the next {@link FacetLabel} for the specified {@code docId}, 
or {@code null} if there are no more.
+ * This method has state: if the provided {@code docId} is the same as the 
previous invocation, it returns the
+ * next {@link FacetLabel} for that document.  Otherwise, it advances to 
the new {@code docId} and provides the
+ * first {@link FacetLabel} for that document, or {@code null} if that 
document has no indexed facets.  Each
+ * new {@code docId} must be in strictly monotonic (increasing) order.
+ *
+ * @param docId input docId provided in monotonic (non-decreasing) order
+ * @return the first or next {@link FacetLabel}, or {@code null} if there 
are no more
+ * @throws IOException when a low-level IO issue occurs
+ */
+public FacetLabel nextFacetLabel(int docId) throws IOException

[GitHub] [lucene-solr] dweiss commented on pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-24 Thread GitBox



dweiss commented on pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919#issuecomment-698412271


   I will take another look. I can't remember forcing block capacity but it's 
been a while!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] uschindler commented on pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-24 Thread GitBox



uschindler commented on pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919#issuecomment-698416370


   I like this approach more to just sum up the size when new blocks are 
allocated and added to Deque.
   
   +1



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



mikemccand commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494398560



##
File path: lucene/facet/src/test/org/apache/lucene/facet/FacetTestCase.java
##
@@ -56,6 +60,28 @@ public Facets getTaxonomyFacetCounts(TaxonomyReader 
taxoReader, FacetsConfig con
 return facets;
   }
 
+  public List> getTaxonomyFacetLabels(TaxonomyReader 
taxoReader, FacetsConfig config, FacetsCollector fc) throws IOException {

Review comment:
   Thank you for adding this utility method so tests can easily use the new 
utility class!
   
   Can we rename this to `getAllTaxonomyFacetLabels`, and add javadoc 
explaining that the outer list is one entry per matched hit, and the inner list 
is one entry per `FacetLabel` belonging to that hit?

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -726,6 +743,39 @@ public void testRandom() throws Exception {
 IOUtils.close(tw, searcher.getIndexReader(), tr, indexDir, taxoDir);
   }
 
+  private static List> 
sortedFacetLabels(List> allfacetLabels) {
+for (List facetLabels : allfacetLabels) {
+  Collections.sort(facetLabels);
+}
+
+Collections.sort(allfacetLabels, (o1, o2) -> {
+  if (o1 == null) {

Review comment:
   Hmm why are these `null` checks necessary?  Are we really seeing `null` 
in the argument?  Oh, I guess this legitimately happens when the hit had no 
facets?  Maybe add a comment?  Hmm, actually, looking at how actual and 
expected are populated, neither of them seems to add `null`?  One of them 
filters out empty list but the other does not?

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -711,6 +723,11 @@ public void testRandom() throws Exception {
 }
   }
 
+  // Test facet labels for each matching test doc
+  List> actualLabels = getTaxonomyFacetLabels(tr, config, 
fc);
+  assertEquals(expectedLabels.size(), actualLabels.size());

Review comment:
   Hmm I think `expectedLabels` filters out empty `List` but 
`actualLabels` does not, so this might false trip?

##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -726,6 +743,39 @@ public void testRandom() throws Exception {
 IOUtils.close(tw, searcher.getIndexReader(), tr, indexDir, taxoDir);
   }
 
+  private static List> 
sortedFacetLabels(List> allfacetLabels) {
+for (List facetLabels : allfacetLabels) {
+  Collections.sort(facetLabels);
+}
+
+Collections.sort(allfacetLabels, (o1, o2) -> {

Review comment:
   I'm confused why we are sorting the top list?  Isn't the top list in 
order of the hits?  And we want to confirm, for a given `docId` hit, that 
expected and actual labels match?
   
   OK, I think I understand: this test does not index anything allowing you to 
track which original doc mapped to which `FacetLabel`, so then you cannot know, 
per segment, which docs ended up where :)
   
   Given that, I think it's OK to do the top-level sort of all 
`List` across all hits.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on pull request #1890: Rename ConfigSetsAPITest to TestConfigSetsAPISolrCloud

2020-09-24 Thread GitBox



cpoerschke commented on pull request #1890:
URL: https://github.com/apache/lucene-solr/pull/1890#issuecomment-698439045


   > ... if you could give me a couple days ... #1892
   
   Sure, no problem at all.
   
   Now that `TestConfigSetsAPI` extends `SolrCloudTestCase` too, I wonder
   * if `TestConfigSetsAPISolrCloud` would still be a good replacement for the 
`ConfigSetsAPITest` still, or
   * if adding of `ConfigSetsAPITest` functionality to `TestConfigSetsAPI` 
might be better?
   
   (Haven't looked at details as yet.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke merged pull request #1825: SOLR-14828: reduce 'error' logging noise in BaseCloudSolrClient.requestWithRetryOnStaleState

2020-09-24 Thread GitBox



cpoerschke merged pull request #1825:
URL: https://github.com/apache/lucene-solr/pull/1825


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14828) reduce 'error' logging noise in BaseCloudSolrClient.requestWithRetryOnStaleState

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201612#comment-17201612
 ] 

ASF subversion and git services commented on SOLR-14828:


Commit 876de8be41a837b83ef7ea6b82b322ed829b0595 in lucene-solr's branch 
refs/heads/master from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=876de8b ]

SOLR-14828: reduce 'error' logging noise in 
BaseCloudSolrClient.requestWithRetryOnStaleState (#1825)



> reduce 'error' logging noise in 
> BaseCloudSolrClient.requestWithRetryOnStaleState
> 
>
> Key: SOLR-14828
> URL: https://issues.apache.org/jira/browse/SOLR-14828
> Project: Solr
>  Issue Type: Task
>  Components: SolrJ
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently -- e.g. 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseCloudSolrClient.java#L960-L961
>  -- an error is logged even if request retrying will happen (and hopefully 
> succeed).
> This task proposes to 'info' or 'warn' rather than 'error' log if the request 
> will be retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke merged pull request #1913: SOLR-11167: Avoid $SOLR_STOP_WAIT use during 'bin/solr start' if $SOLR_START_WAIT is supplied.

2020-09-24 Thread GitBox



cpoerschke merged pull request #1913:
URL: https://github.com/apache/lucene-solr/pull/1913


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201615#comment-17201615
 ] 

ASF subversion and git services commented on SOLR-11167:


Commit ea77d242377d942912525f76c307de568c2b3d90 in lucene-solr's branch 
refs/heads/master from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ea77d24 ]

SOLR-11167: Avoid $SOLR_STOP_WAIT use during 'bin/solr start' if 
$SOLR_START_WAIT is supplied. (#1913)



> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.7
>
> Attachments: SOLR-11167.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn merged pull request #1914: Move 9x upgrade notes out of changes.txt

2020-09-24 Thread GitBox



munendrasn merged pull request #1914:
URL: https://github.com/apache/lucene-solr/pull/1914


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201622#comment-17201622
 ] 

ASF subversion and git services commented on SOLR-11167:


Commit 38ab92da8b6649e4232e4d7fa6833b5cebdff993 in lucene-solr's branch 
refs/heads/branch_8x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=38ab92d ]

SOLR-11167: Avoid $SOLR_STOP_WAIT use during 'bin/solr start' if 
$SOLR_START_WAIT is supplied. (#1913)

Resolved Conflicts:
solr/CHANGES.txt


> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.7
>
> Attachments: SOLR-11167.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14828) reduce 'error' logging noise in BaseCloudSolrClient.requestWithRetryOnStaleState

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201621#comment-17201621
 ] 

ASF subversion and git services commented on SOLR-14828:


Commit eca8aa81718d025e215b21c9e81b4b4620ec8f1e in lucene-solr's branch 
refs/heads/branch_8x from Christine Poerschke
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=eca8aa8 ]

SOLR-14828: reduce 'error' logging noise in 
BaseCloudSolrClient.requestWithRetryOnStaleState (#1825)



> reduce 'error' logging noise in 
> BaseCloudSolrClient.requestWithRetryOnStaleState
> 
>
> Key: SOLR-14828
> URL: https://issues.apache.org/jira/browse/SOLR-14828
> Project: Solr
>  Issue Type: Task
>  Components: SolrJ
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently -- e.g. 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseCloudSolrClient.java#L960-L961
>  -- an error is logged even if request retrying will happen (and hopefully 
> succeed).
> This task proposes to 'info' or 'warn' rather than 'error' log if the request 
> will be retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on pull request #1371: SOLR-14333: print readable version of CollapsedPostFilter query

2020-09-24 Thread GitBox



munendrasn commented on pull request #1371:
URL: https://github.com/apache/lucene-solr/pull/1371#issuecomment-698447210


   Fixed the failing tests and updated changes to include deprecation



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke opened a new pull request #1920: branch_8x: add two missing(?) solr/CHANGES.txt entries

2020-09-24 Thread GitBox



cpoerschke opened a new pull request #1920:
URL: https://github.com/apache/lucene-solr/pull/1920


   Encountered a cherry-pick merge conflict and it seems that these two entries 
are present in the master branch's solr/CHANGES.txt 8.7 section but 
(unintentionally?) missing in the branch_8x solr/CHANGES.txt 8.7 section.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14828) reduce 'error' logging noise in BaseCloudSolrClient.requestWithRetryOnStaleState

2020-09-24 Thread Christine Poerschke (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14828?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved SOLR-14828.

Fix Version/s: 8.7
   master (9.0)
   Resolution: Fixed

> reduce 'error' logging noise in 
> BaseCloudSolrClient.requestWithRetryOnStaleState
> 
>
> Key: SOLR-14828
> URL: https://issues.apache.org/jira/browse/SOLR-14828
> Project: Solr
>  Issue Type: Task
>  Components: SolrJ
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.7
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Currently -- e.g. 
> https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.6.2/solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseCloudSolrClient.java#L960-L961
>  -- an error is logged even if request retrying will happen (and hopefully 
> succeed).
> This task proposes to 'info' or 'warn' rather than 'error' log if the request 
> will be retried.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-11167) bin/solr uses $SOLR_STOP_WAIT during start

2020-09-24 Thread Christine Poerschke (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-11167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke resolved SOLR-11167.

Resolution: Fixed

> bin/solr uses $SOLR_STOP_WAIT during start
> --
>
> Key: SOLR-11167
> URL: https://issues.apache.org/jira/browse/SOLR-11167
> Project: Solr
>  Issue Type: Improvement
>  Components: scripts and tools
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
> Fix For: master (9.0), 8.7
>
> Attachments: SOLR-11167.patch
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> bin/solr using $SOLR_STOP_WAIT during start is unexpected, I think it would 
> be clearer to have a separate $SOLR_START_WAIT variable.
> related minor thing: SOLR_STOP_WAIT is mentioned in solr.in.sh but not in 
> solr.in.cmd equivalent.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on pull request #1920: branch_8x: add two missing(?) solr/CHANGES.txt entries

2020-09-24 Thread GitBox



tflobbe commented on pull request #1920:
URL: https://github.com/apache/lucene-solr/pull/1920#issuecomment-698457823


   ugh, looks like I forgot to backport the CHANGES entry in my change. Thanks 
Christine.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn merged pull request #1900: SOLR-14036: Remove explicit distrib=false from /terms handler

2020-09-24 Thread GitBox



munendrasn merged pull request #1900:
URL: https://github.com/apache/lucene-solr/pull/1900


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14036) TermsComponent distributed search (shards) doesn't work with SolrCloud

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201648#comment-17201648
 ] 

ASF subversion and git services commented on SOLR-14036:


Commit ac5847231017f12d6a51348e1cdbd50c9732a224 in lucene-solr's branch 
refs/heads/master from Munendra S N
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ac58472 ]

SOLR-14036: Remove explicit distrib=false from /terms handler (#1900)

* Remove distrib=false from /terms handler so that terms are returned from 
across all shards instead of a single local shard.
* cleanup shards parameter handling in TermsComponent. This is handled in 
HttpShardHandler
* Remove redundant tests for shard whitelist
* remove redundant terms params from ScoreNodeStream


> TermsComponent distributed search (shards) doesn't work with SolrCloud
> --
>
> Key: SOLR-14036
> URL: https://issues.apache.org/jira/browse/SOLR-14036
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: David Smiley
>Assignee: Munendra S N
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> My colleagues [~bruno.roustant] and [~antogruz] attempted to use the 
> {{TermsComponent}} in SolrCloud on a collection with multiple shards.  The 
> results were inconsistent depending on which shard the client was talking 
> with.  Looking at the prepare() method, I can see this component reads the 
> "shards" param.  It should not have been coded that way; the SearchHandler or 
> related machinery is responsible for parsing/processing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14036) TermsComponent distributed search (shards) doesn't work with SolrCloud

2020-09-24 Thread Munendra S N (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-14036:

Fix Version/s: master (9.0)
   Resolution: Fixed  (was: Invalid)
   Status: Resolved  (was: Patch Available)

> TermsComponent distributed search (shards) doesn't work with SolrCloud
> --
>
> Key: SOLR-14036
> URL: https://issues.apache.org/jira/browse/SOLR-14036
> Project: Solr
>  Issue Type: Improvement
>  Components: SearchComponents - other
>Reporter: David Smiley
>Assignee: Munendra S N
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> My colleagues [~bruno.roustant] and [~antogruz] attempted to use the 
> {{TermsComponent}} in SolrCloud on a collection with multiple shards.  The 
> results were inconsistent depending on which shard the client was talking 
> with.  Looking at the prepare() method, I can see this component reads the 
> "shards" param.  It should not have been coded that way; the SearchHandler or 
> related machinery is responsible for parsing/processing that.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Chris M. Hostetter (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201659#comment-17201659
 ] 

Chris M. Hostetter commented on SOLR-14889:
---

thanks guys ... this is very enlightening.
{quote}The "doFirst" and "doLast" clauses in tasks add an anonymous closure to 
the list of things to run in execution phase. So even if a script looks linear, 
the closure in doFirst and doLast runs at a completely different time than 
what's outside of it.
{quote}
doFirst/doLast are actually the most straight foward aspect of the gradle 
lifecycle as far as i can tell : ) ... what's blowing my mind is what Uwe 
pointed out, about how during "configuration" the task execution code evidently 
... "captures" (for lack of a better word) ... references to the 
variables/input used that code, so that _re-assigning_ to those variables in 
doFirst had no effect, but modifying the objects those variables pointed to did.

...that's wild...

... and really nothing i'd seen in any of the gradle tutotrials/lifecycle 
discussion really prepared me for that.

I have a few lingering questions, mostly about some of the ancillary stuff, in 
the latest patch (i didn't dig into who changed what) ...
 * can we make `templateProps` final now? ... i didn't realize groovy supported 
final as a keyword, and seems like a good idea for as many things as possible 
to be final
 ** should we use `asImmutable()` which is apparently a groovy add-on for on 
maps/collections?
 * The "TODO 2" comment is stale and makes sense to remove, but shouldn't the 
"TODO 1" and "TODO 3" comments stick around? ... those are still applicable 
aren't they?
 * why is buildSiteJekyll now hooked into the "assemble" task?
 ** my understanding is that "assemble" is for building the lucene/solr 
artifacts/distribution – but the ref-guide shouldn't be included in that, we 
don't "release" it officially
 ** even if it does make sense to hook into "assemble" why is it hooking 
directly to buildSiteJekyll and not buildSite ?
 *** if that was to avoid the "check" style validation `buildSite` does ok, but 
as discussed in SOLR-14870 the way forward there seemed to be to pull it out of 
buildSite into it's own "checkSite" task .. "buildSite" is the main task people 
should know about/run ... buildSiteJekyll (as a task name) is an implementation 
detail that should really go away

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14843) Define strongly-typed cluster configuration API

2020-09-24 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201668#comment-17201668
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14843:
--

I don’t know what Andrzej was thinking when he created this Jira, but what 
thought when I saw it was something like:

The “consumer” side (our code, components, etc) could look something like:

{code:java}
int myInt = Config.getInteger(“some.configurable.thing”, default: 30);
String myStr = Config.getString(“some.configurable.string”, default: “foo”);
MyObject myStr = Config.get(“some.configurable.obj”, new SomeSortOfFactory());
{code}

Maybe even be able to support attach an onChange event, like
{code:java}
int myInt = Config.getInteger(“some.configurable.thing”, default: 30, onChange: 
(v) -> { setMyInt(v); refresh()});
{code}
or something. 

Then, this {{Config}} class could load the configuration from a predictable 
hierarchy, something like:
{noformat}
system props > env > cluster props > node props
{noformat}
(don’t know if that’s the right order, and again, there could be more than one 
hierarchy), so that a property can be set in the node configuration, but could 
be overriden by collection level properties, etc. 

One extra nice thing of an approach like this is that we could have an API to 
show exactly the current configuration and where each config is coming from, 
something like:
{code}
some.configurable.string: {
  value: “bar”,
  source: “collection property”
}

some.configurable.thing: {
  value: 30,
  source: “default”
}
{code}
Maybe even a timestamp of the change or something.


> Define strongly-typed cluster configuration API
> ---
>
> Key: SOLR-14843
> URL: https://issues.apache.org/jira/browse/SOLR-14843
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>
> Current cluster-level configuration uses a hodgepodge of traditional Solr 
> config sources (solr.xml, system properties) and the new somewhat arbitrary 
> config files kept in ZK ({{/clusterprops.json, /security.json, 
> /packages.json, /autoscaling.json}} etc...). There's no uniform 
> strongly-typed API to access and manage these configs - currently each config 
> source has its own CRUD, often relying on direct access to Zookeeper. There's 
> also no uniform method for monitoring changes to these config sources.
> This issue proposes a uniform config API facade with the following 
> characteristics:
>  * Using a single hierarchical (or at least key-based) facade for accessing 
> any global config.
>  * Using strongly-typed sub-system configs instead of opaque Map-s: 
> components would no longer deal with JSON parsing/writing, instead they would 
> use properly annotated Java objects for config CRUD. Config objects would 
> include versioning information (eg. lastModified timestamp).
>  * Isolating access to the underlying config persistence layer: components 
> would no longer directly interact with Zookeeper or files. Most likely the 
> default implementation would continue using different ZK files per-subsystem 
> in order to limit the complexity of file formats and to reduce the cost of 
> notifications for unmodified parts of the configs.
>  * Providing uniform way to register listeners for monitoring changes in 
> specific configs: components would no longer need to interact with ZK 
> watches, they would instead be notified about modified configs that they are 
> interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201682#comment-17201682
 ] 

ASF subversion and git services commented on SOLR-14503:


Commit ddd10725b00649edc80726c59f9fdf0442adb6c2 in lucene-solr's branch 
refs/heads/master from Munendra S N
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=ddd1072 ]

SOLR-14503: use specified waitForZk val as conn timeout for zk

* Also, consume SOLR_WAIT_FOR_ZK in bin/solr.cmd


> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-24 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201683#comment-17201683
 ] 

Munendra S N commented on SOLR-14503:
-

Beasted the test before committing, will shortly backport to 8x
{code:java}
./gradlew -p solr/core beast -Ptests.dups=10 --tests ZkFailoverTest
{code}


> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-24 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201692#comment-17201692
 ] 

ASF subversion and git services commented on SOLR-14503:


Commit 894f91100d3bc1eab8332c9066222d99572393a3 in lucene-solr's branch 
refs/heads/branch_8x from Munendra S N
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=894f911 ]

SOLR-14503: use specified waitForZk val as conn timeout for zk

* Also, consume SOLR_WAIT_FOR_ZK in bin/solr.cmd


> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14503) Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property

2020-09-24 Thread Munendra S N (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Munendra S N updated SOLR-14503:

Fix Version/s: 8.7
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~cjcowie]

> Solr does not respect waitForZk (SOLR_WAIT_FOR_ZK) property
> ---
>
> Key: SOLR-14503
> URL: https://issues.apache.org/jira/browse/SOLR-14503
> Project: Solr
>  Issue Type: Bug
>Affects Versions: 7.1, 7.2, 7.2.1, 7.3, 7.3.1, 7.4, 7.5, 7.6, 7.7, 7.7.1, 
> 7.7.2, 8.0, 8.1, 8.2, 7.7.3, 8.1.1, 8.3, 8.4, 8.3.1, 8.5, 8.4.1, 8.5.1
>Reporter: Colvin Cowie
>Assignee: Munendra S N
>Priority: Minor
> Fix For: 8.7
>
> Attachments: SOLR-14503.patch, SOLR-14503.patch
>
>
> When starting Solr in cloud mode, if zookeeper is not available within 30 
> seconds, then core container intialization fails and the node will not 
> recover when zookeeper is available.
>  
> I believe SOLR-5129 should have addressed this issue, however it doesn't 
> quite do so for two reasons:
>  # 
> [https://github.com/apache/lucene-solr/blob/master/solr/core/src/java/org/apache/solr/servlet/SolrDispatchFilter.java#L297]
>  it calls {{SolrZkClient(String zkServerAddress, int zkClientTimeout)}} 
> rather than {{SolrZkClient(String zkServerAddress, int zkClientTimeout, int 
> zkClientConnectTimeout)}} so the DEFAULT_CLIENT_CONNECT_TIMEOUT of 30 seconds 
> is used even when you specify a different waitForZk value
>  # bin/solr contains script to set -DwaitForZk from the SOLR_WAIT_FOR_ZK 
> environment property 
> [https://github.com/apache/lucene-solr/blob/master/solr/bin/solr#L2148] but 
> there is no corresponding assignment in bin/solr.cmd, while SOLR_WAIT_FOR_ZK 
> appears in the solr.in.cmd as an example.
>  
> I will attach a patch that fixes the above.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14843) Define strongly-typed cluster configuration API

2020-09-24 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201732#comment-17201732
 ] 

Ilan Ginzburg commented on SOLR-14843:
--

I think we need to think the hierarchy carefully... Node props being overridden 
by cluster props  might work, but I can easily think of use cases where the 
opposite makes sense as well, for example configuring a one off node in an 
otherwise homogeneous cluster, but then system properties and environment 
variables (which really are node props) override all the rest...

I believe a proposal needs to also have another dimension of where these 
configurations come from. For system properties and environment variables it's 
pretty simple, but cluster and node props can be in some central place (ZK) or 
can be defined within the Solr distribution (file) and as such can end up being 
different on each node (nothing prevents deploying slightly different images on 
the nodes, or changing the node config after deploy).

What I really need short term is a way to do what {{solr.xml}} allows me doing 
(define default config, let the user change them before deploy if he so 
wishes). We do not currently have a replacement for this.

> Define strongly-typed cluster configuration API
> ---
>
> Key: SOLR-14843
> URL: https://issues.apache.org/jira/browse/SOLR-14843
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Andrzej Bialecki
>Priority: Major
>  Labels: clean-api
> Fix For: master (9.0)
>
>
> Current cluster-level configuration uses a hodgepodge of traditional Solr 
> config sources (solr.xml, system properties) and the new somewhat arbitrary 
> config files kept in ZK ({{/clusterprops.json, /security.json, 
> /packages.json, /autoscaling.json}} etc...). There's no uniform 
> strongly-typed API to access and manage these configs - currently each config 
> source has its own CRUD, often relying on direct access to Zookeeper. There's 
> also no uniform method for monitoring changes to these config sources.
> This issue proposes a uniform config API facade with the following 
> characteristics:
>  * Using a single hierarchical (or at least key-based) facade for accessing 
> any global config.
>  * Using strongly-typed sub-system configs instead of opaque Map-s: 
> components would no longer deal with JSON parsing/writing, instead they would 
> use properly annotated Java objects for config CRUD. Config objects would 
> include versioning information (eg. lastModified timestamp).
>  * Isolating access to the underlying config persistence layer: components 
> would no longer directly interact with Zookeeper or files. Most likely the 
> default implementation would continue using different ZK files per-subsystem 
> in order to limit the complexity of file formats and to reduce the cost of 
> notifications for unmodified parts of the configs.
>  * Providing uniform way to register listeners for monitoring changes in 
> specific configs: components would no longer need to interact with ZK 
> watches, they would instead be notified about modified configs that they are 
> interested in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201749#comment-17201749
 ] 

Dawid Weiss commented on SOLR-14889:


bq. references to the variables/input used that code, so that re-assigning to 
those variables in doFirst had no effect, but modifying the objects those 
variables pointed to did.

I think what makes it more complex is that some of these "assignments" are 
actually syntactic sugar for method calls with parameters. So you're calling a 
method with a reference to a local variable - this method stores that reference 
somewhere. When you assign a different object to that variable later nothing 
really happens to the reference previously published. I don't think there's 
much magic involved there, really - it's the syntactic sugar that makes some of 
these calls not obvious. If we added brackets around method calls it'd be 
probably clearer.

bq. but shouldn't the "TODO 1" and "TODO 3" comments stick around? ... those 
are still applicable aren't they?

Yes, but it's probably better to file a jira issue than leave them as comments. 

bq. why is buildSiteJekyll now hooked into the "assemble" task?

assemble is a base plugin's convention task for "assembling the outcomes" of a 
project. It's not related to lucene/solr distribution - we can *use* whatever 
is assembled in the packaging project, but we don't have to. When you see an 
unknown gradle project you'd typically run 'gradlew assemble' to build stuff. 
Much like 'mvn package' works. Maven analogy works with "check" too (mvn 
validate). 

Please change the task names according to your expertise - it seemed to me that 
buildSite runs the assemble and validation (check) - It was an arbitrary choice 
on my behalf to just hook it up this way.




> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #1919: Compute RAM usage ByteBuffersDataOutput on the fly.

2020-09-24 Thread GitBox



dweiss commented on pull request #1919:
URL: https://github.com/apache/lucene-solr/pull/1919#issuecomment-698564766


   I like the explicit field too, actually - even if you're right about 
capacity of internal blocks, Adrien. - this assertion there may actually be my 
mistake and the capacity of a new block should actually be its limit (remaining 
free space)... Or maybe I did have capacity in mind (can't remember, to be 
honest).
   ```
   currentBlock = blockAllocate.apply(requiredBlockSize);
   assert currentBlock.capacity() == requiredBlockSize;
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494637269



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -711,6 +723,11 @@ public void testRandom() throws Exception {
 }
   }
 
+  // Test facet labels for each matching test doc
+  List> actualLabels = getTaxonomyFacetLabels(tr, config, 
fc);
+  assertEquals(expectedLabels.size(), actualLabels.size());

Review comment:
   Nice catch, thanks.  I fixed `actualLabels` generation in 
`FacetTestCase.getAllTaxonomyFacetLabels()` method to filter out empty 
`List`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14889) improve templated variable escaping in ref-guide _config.yml

2020-09-24 Thread Uwe Schindler (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201802#comment-17201802
 ] 

Uwe Schindler commented on SOLR-14889:
--

bq. can we make `templateProps` final now? ... i didn't realize groovy 
supported final as a keyword, and seems like a good idea for as many things as 
possible to be final

No, that's not possible. You can only make local variables or real class member 
final in Groovy. The code I added was just a local variable, that was accessed 
from the closure. The "ext" properties are properties and can be modified at 
any time. I was just saying: They should only be used during the configuration 
phase, while task execution, nothing should change them.

bq. why is buildSiteJekyll now hooked into the "assemble" task?

As Dawid said, assemble is per project. And the site is already built in 
"check", so as it's already there you can also assemble it. I am planning to 
also make the global javadocs/documentation a separate project with assemble. 
FYI, I commented in the assemble dependency.

> improve templated variable escaping in ref-guide _config.yml
> 
>
> Key: SOLR-14889
> URL: https://issues.apache.org/jira/browse/SOLR-14889
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14889.patch, SOLR-14889.patch, SOLR-14889.patch, 
> SOLR-14889.patch, SOLR-14889.patch
>
>
> SOLR-14824 ran into windows failures when we switching from using a hardcoded 
> "relative" path to the solrRootPath  to using groovy/project variables to get 
> the path.  the reason for the failures was that the path us used as a 
> variable tempted into {{_config.yml.template}} to build the {{_config.yml}} 
> file, but on windows the path seperater of '\' was being parsed by 
> jekyll/YAML as a string escape character.
> (This wasn't a problem we ran into before, even on windows, prior to the 
> SOLR-14824 changes, because the hardcoded relative path only used '/' 
> delimiters, which (j)ruby was happy to work with, even on windows.
> As Uwe pointed out when hotfixing this...
> {quote}Problem was that backslashes are used to escape strings, but windows 
> paths also have those. Fix was to add StringEscapeUtils, but I don't like 
> this too much. Maybe we find a better solution to make special characters in 
> those properties escaped correctly when used in strings inside templates.
> {quote}
> ...the current fix of using {{StringEscapeUtils.escapeJava}} - only for this 
> one variable -- doesn't really protect other variables that might have 
> special charactes in them down the road, and while "escapeJava" work ok for 
> the "\" issue, it isn't neccessarily consistent with all YAML escapse, which 
> could lead to even weird bugs/cofusion down the road.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494654559



##
File path: lucene/facet/src/test/org/apache/lucene/facet/FacetTestCase.java
##
@@ -56,6 +60,28 @@ public Facets getTaxonomyFacetCounts(TaxonomyReader 
taxoReader, FacetsConfig con
 return facets;
   }
 
+  public List> getTaxonomyFacetLabels(TaxonomyReader 
taxoReader, FacetsConfig config, FacetsCollector fc) throws IOException {

Review comment:
   done in the next revision.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494655549



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -726,6 +743,39 @@ public void testRandom() throws Exception {
 IOUtils.close(tw, searcher.getIndexReader(), tr, indexDir, taxoDir);
   }
 
+  private static List> 
sortedFacetLabels(List> allfacetLabels) {
+for (List facetLabels : allfacetLabels) {
+  Collections.sort(facetLabels);
+}
+
+Collections.sort(allfacetLabels, (o1, o2) -> {
+  if (o1 == null) {

Review comment:
   Thanks for catching this @mikemccand. I fixed the `actualLabels` to 
exclude empty lists. The null checks were just me being extra cautious. I 
realized they were unnecessary and removed them :-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Assigned] (SOLR-14829) Default components are missing facet_module and terms in documentation

2020-09-24 Thread Alexandre Rafalovitch (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch reassigned SOLR-14829:


Assignee: Alexandre Rafalovitch  (was: Ishan Chattopadhyaya)

> Default components are missing facet_module and terms in documentation
> --
>
> Key: SOLR-14829
> URL: https://issues.apache.org/jira/browse/SOLR-14829
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, examples
>Affects Versions: 8.6.2
>Reporter: Johannes Baiter
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: SOLR-14829.patch
>
>
> In the reference guide, the list of search components that are enabled by 
> default is missing the {{facet_module}} and {{terms}} components. The terms 
> component is instead listed under "other useful components", while the 
> {{FacetModule}} is never listed anywhere in the documentation, despite it 
> being neccessary for the JSON Facet API to work.
> This is also how I stumbled upon this, I spent hours trying to figure out why 
> JSON-based faceting was not working with my setup, after taking a glance at 
> the {{SearchHandler}} source code based on a hunch, it became clear that my 
> custom list of search components (created based on the list in the reference 
> guide) was to blame.
> A patch for the documentation gap is attached, but I think there are some 
> other issues with the naming/documentation around the two faceting APIs that 
> may be worth discussing:
>  * The names {{facet_module}} / {{FacetModule}} are very misleading, since 
> the documentation is always talking about the "JSON Facet API", but the term 
> "JSON" does not appear in the name of the component nor does the component 
> have any documentation attached that mentions this
>  * Why is the {{FacetModule}} class located in the {{search.facet}} package 
> while every single other search component included in the core is located in 
> the {{handler.component}} package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494665188



##
File path: 
lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetCounts.java
##
@@ -726,6 +743,39 @@ public void testRandom() throws Exception {
 IOUtils.close(tw, searcher.getIndexReader(), tr, indexDir, taxoDir);
   }
 
+  private static List> 
sortedFacetLabels(List> allfacetLabels) {
+for (List facetLabels : allfacetLabels) {
+  Collections.sort(facetLabels);
+}
+
+Collections.sort(allfacetLabels, (o1, o2) -> {

Review comment:
   Yes, a document with `N`th position in the input sequence 
might end up with `K`th docId in a random segment making it harder 
to compare actual and expected labels.
   
   Thanks for confirming that the approach is acceptable.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14829) Default components are missing facet_module and terms in documentation

2020-09-24 Thread Alexandre Rafalovitch (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201815#comment-17201815
 ] 

Alexandre Rafalovitch commented on SOLR-14829:
--

The patch no longer applies clean because some of the other documentation 
changes I did. But I also did some related research and want to cleanup the 
default components information in RefGuide and solrconfig files. So, I took the 
issue over from Ishan and will make it a bit more generic.

> Default components are missing facet_module and terms in documentation
> --
>
> Key: SOLR-14829
> URL: https://issues.apache.org/jira/browse/SOLR-14829
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, examples
>Affects Versions: 8.6.2
>Reporter: Johannes Baiter
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: SOLR-14829.patch
>
>
> In the reference guide, the list of search components that are enabled by 
> default is missing the {{facet_module}} and {{terms}} components. The terms 
> component is instead listed under "other useful components", while the 
> {{FacetModule}} is never listed anywhere in the documentation, despite it 
> being neccessary for the JSON Facet API to work.
> This is also how I stumbled upon this, I spent hours trying to figure out why 
> JSON-based faceting was not working with my setup, after taking a glance at 
> the {{SearchHandler}} source code based on a hunch, it became clear that my 
> custom list of search components (created based on the list in the reference 
> guide) was to blame.
> A patch for the documentation gap is attached, but I think there are some 
> other issues with the naming/documentation around the two faceting APIs that 
> may be worth discussing:
>  * The names {{facet_module}} / {{FacetModule}} are very misleading, since 
> the documentation is always talking about the "JSON Facet API", but the term 
> "JSON" does not appear in the name of the component nor does the component 
> have any documentation attached that mentions this
>  * Why is the {{FacetModule}} class located in the {{search.facet}} package 
> while every single other search component included in the core is located in 
> the {{handler.component}} package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14829) Update default components information in Reference Guide and solrconfig.xml files

2020-09-24 Thread Alexandre Rafalovitch (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alexandre Rafalovitch updated SOLR-14829:
-
Summary: Update default components information in Reference Guide and 
solrconfig.xml files  (was: Default components are missing facet_module and 
terms in documentation)

> Update default components information in Reference Guide and solrconfig.xml 
> files
> -
>
> Key: SOLR-14829
> URL: https://issues.apache.org/jira/browse/SOLR-14829
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, examples
>Affects Versions: 8.6.2
>Reporter: Johannes Baiter
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: SOLR-14829.patch
>
>
> In the reference guide, the list of search components that are enabled by 
> default is missing the {{facet_module}} and {{terms}} components. The terms 
> component is instead listed under "other useful components", while the 
> {{FacetModule}} is never listed anywhere in the documentation, despite it 
> being neccessary for the JSON Facet API to work.
> This is also how I stumbled upon this, I spent hours trying to figure out why 
> JSON-based faceting was not working with my setup, after taking a glance at 
> the {{SearchHandler}} source code based on a hunch, it became clear that my 
> custom list of search components (created based on the list in the reference 
> guide) was to blame.
> A patch for the documentation gap is attached, but I think there are some 
> other issues with the naming/documentation around the two faceting APIs that 
> may be worth discussing:
>  * The names {{facet_module}} / {{FacetModule}} are very misleading, since 
> the documentation is always talking about the "JSON Facet API", but the term 
> "JSON" does not appear in the name of the component nor does the component 
> have any documentation attached that mentions this
>  * Why is the {{FacetModule}} class located in the {{search.facet}} package 
> while every single other search component included in the core is located in 
> the {{handler.component}} package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] goankur commented on a change in pull request #1893: LUCENE-9444 Utility class to get facet labels from taxonomy for a fac…

2020-09-24 Thread GitBox



goankur commented on a change in pull request #1893:
URL: https://github.com/apache/lucene-solr/pull/1893#discussion_r494675000



##
File path: 
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/TaxonomyFacetLabels.java
##
@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.facet.taxonomy;
+
+import org.apache.lucene.facet.FacetsConfig;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.util.IntsRef;
+
+import java.io.IOException;
+
+import static org.apache.lucene.facet.taxonomy.TaxonomyReader.INVALID_ORDINAL;
+import static org.apache.lucene.facet.taxonomy.TaxonomyReader.ROOT_ORDINAL;
+
+/**
+ * Utility class to easily retrieve previously indexed facet labels, allowing 
you to skip also adding stored fields for these values,
+ * reducing your index size.
+ *
+ * @lucene.experimental
+ **/
+public class TaxonomyFacetLabels {
+
+  /**
+   * Index field name provided to the constructor
+   */
+  private final String indexFieldName;
+
+  /**
+   * {@code TaxonomyReader} provided to the constructor
+   */
+  private final TaxonomyReader taxoReader;
+
+  /**
+   * {@code FacetsConfig} provided to the constructor
+   */
+  private final FacetsConfig config;
+
+  /**
+   * {@code OrdinalsReader} to decode ordinals previously indexed into the 
{@code BinaryDocValues} facet field
+   */
+  private final OrdinalsReader ordsReader;
+
+  /**
+   * Sole constructor.  Do not close the provided {@link TaxonomyReader} while 
still using this instance!
+   */
+  public TaxonomyFacetLabels(TaxonomyReader taxoReader, FacetsConfig config, 
String indexFieldName) throws IOException {
+this.taxoReader = taxoReader;
+this.config = config;
+this.indexFieldName = indexFieldName;
+this.ordsReader = new DocValuesOrdinalsReader(indexFieldName);
+  }
+
+  /**
+   * Create and return an instance of {@link FacetLabelReader} to retrieve 
facet labels for
+   * multiple documents and (optionally) for a specific dimension.  You must 
create this per-segment,
+   * and then step through all hits, in order, for that segment.
+   *
+   * NOTE: This class is not thread-safe, so you must use a new 
instance of this
+   * class for each thread.
+   *
+   * @param readerContext LeafReaderContext used to access the {@code 
BinaryDocValues} facet field
+   * @return an instance of {@link FacetLabelReader}
+   * @throws IOException when a low-level IO issue occurs
+   */
+  public FacetLabelReader getFacetLabelReader(LeafReaderContext readerContext) 
throws IOException {
+return new FacetLabelReader(ordsReader, readerContext);
+  }
+
+  /**
+   * Utility class to retrieve facet labels for multiple documents.
+   *
+   * @lucene.experimental
+   */
+  public class FacetLabelReader {
+private final OrdinalsReader.OrdinalsSegmentReader ordinalsSegmentReader;
+private final IntsRef decodedOrds = new IntsRef();
+private int currentDocId = -1;
+private int currentPos = -1;
+
+// Lazily set when nextFacetLabel(int docId, String facetDimension) is 
first called
+private int[] parents;
+
+/**
+ * Sole constructor.
+ */
+public FacetLabelReader(OrdinalsReader ordsReader, LeafReaderContext 
readerContext) throws IOException {
+  ordinalsSegmentReader = ordsReader.getReader(readerContext);
+}
+
+/**
+ * Retrieves the next {@link FacetLabel} for the specified {@code docId}, 
or {@code null} if there are no more.
+ * This method has state: if the provided {@code docId} is the same as the 
previous invocation, it returns the
+ * next {@link FacetLabel} for that document.  Otherwise, it advances to 
the new {@code docId} and provides the
+ * first {@link FacetLabel} for that document, or {@code null} if that 
document has no indexed facets.  Each
+ * new {@code docId} must be in strictly monotonic (increasing) order.
+ *
+ * @param docId input docId provided in monotonic (non-decreasing) order
+ * @return the first or next {@link FacetLabel}, or {@code null} if there 
are no more
+ * @throws IOException when a low-level IO issue occurs
+ */
+public FacetLabel nextFacetLabel(int docId) throws IOException {

[jira] [Commented] (SOLR-14829) Update default components information in Reference Guide and solrconfig.xml files

2020-09-24 Thread Alexandre Rafalovitch (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201841#comment-17201841
 ] 

Alexandre Rafalovitch commented on SOLR-14829:
--

Ok, the [Reference Guide 
page|https://lucene.apache.org/solr/guide/8_6/requesthandlers-and-searchcomponents-in-solrconfig.html]
 is rather a mess. Here is a couple of things that need cleaning up:
* defaults, appends, and invariants are defined as part of SearchHandler, 
should be all handlers
* we mention initParams (in general section, good) but not useParams
* it is very hard to notice that SearchComponents references (components, 
first-components, last-components) are actually section *inside* SearchHandlers 
only, partially because they are so far apart
* we don't actually explain how to declare a custom search component explicitly 
(some definitions are available on linked pages)
* we don't have example of UpdateRequestHandlers either in the doc or in 
solrconfig.xml (because they all became implicit)
* we don't mention UpdateRequestProcessors, which could be viewed as a parallel 
pipeline to SearchComponents

I am going to try refactoring that page.

> Update default components information in Reference Guide and solrconfig.xml 
> files
> -
>
> Key: SOLR-14829
> URL: https://issues.apache.org/jira/browse/SOLR-14829
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation, examples
>Affects Versions: 8.6.2
>Reporter: Johannes Baiter
>Assignee: Alexandre Rafalovitch
>Priority: Minor
> Attachments: SOLR-14829.patch
>
>
> In the reference guide, the list of search components that are enabled by 
> default is missing the {{facet_module}} and {{terms}} components. The terms 
> component is instead listed under "other useful components", while the 
> {{FacetModule}} is never listed anywhere in the documentation, despite it 
> being neccessary for the JSON Facet API to work.
> This is also how I stumbled upon this, I spent hours trying to figure out why 
> JSON-based faceting was not working with my setup, after taking a glance at 
> the {{SearchHandler}} source code based on a hunch, it became clear that my 
> custom list of search components (created based on the list in the reference 
> guide) was to blame.
> A patch for the documentation gap is attached, but I think there are some 
> other issues with the naming/documentation around the two faceting APIs that 
> may be worth discussing:
>  * The names {{facet_module}} / {{FacetModule}} are very misleading, since 
> the documentation is always talking about the "JSON Facet API", but the term 
> "JSON" does not appear in the name of the component nor does the component 
> have any documentation attached that mentions this
>  * Why is the {{FacetModule}} class located in the {{search.facet}} package 
> while every single other search component included in the core is located in 
> the {{handler.component}} package?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

2020-09-24 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201915#comment-17201915
 ] 

Noble Paul commented on SOLR-14613:
---

The 2nd parameter is a varargs parameter.  so, you can set a deeply nested value


My other concern is why have we committed a large amount of code without a 
single test case?

> Provide a clean API for pluggable replica assignment implementations
> 
>
> Key: SOLR-14613
> URL: https://issues.apache.org/jira/browse/SOLR-14613
> Project: Solr
>  Issue Type: Improvement
>  Components: AutoScaling
>Reporter: Andrzej Bialecki
>Assignee: Ilan Ginzburg
>Priority: Major
>  Time Spent: 41h
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] noblepaul commented on pull request #1863: SOLR-14701: GuessSchemaFields URP to replace AddSchemaFields URP in schemaless mode

2020-09-24 Thread GitBox



noblepaul commented on pull request #1863:
URL: https://github.com/apache/lucene-solr/pull/1863#issuecomment-698728576


   I recommend a new request handler such as `/update/guess-schema`
   
   This way we do not need to add any new functionality, nor do we need to pass 
any extra params
   
   ```
   curl -F 'data=@datafile.json' 
http://localhost:8983/gettingstarted/update/guess-schema
   ```
   The response can be 
   
   ```
   curl -X POST -H 'Content-type: application/json' -d '{"add-field":[
   {
   "name":"id",  
   "type":"string",
"stored":true },
   {
   "name":"desc",  
   "type":"text",
"stored":true
}
   ]}' http://localhost:8983/solr/gettingstarted/schema
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9535) Investigate recent indexing slowdown for wikimedium documents

2020-09-24 Thread Adrien Grand (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17201924#comment-17201924
 ] 

Adrien Grand commented on LUCENE-9535:
--

Indexing throughput looks like it's back to where it was before including 
stored fields in DWPT accounting: 
https://home.apache.org/~mikemccand/lucenebench/indexing.html

> Investigate recent indexing slowdown for wikimedium documents
> -
>
> Key: LUCENE-9535
> URL: https://issues.apache.org/jira/browse/LUCENE-9535
> Project: Lucene - Core
>  Issue Type: Task
>Reporter: Adrien Grand
>Priority: Minor
> Attachments: cpu_profile.svg
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Nightly benchmarks report a ~10% slowdown for 1kB documents as of September 
> 9th: [http://people.apache.org/~mikemccand/lucenebench/indexing.html].
> On that day, we added stored fields in DWPT accounting (LUCENE-9511), so I 
> first thought this could be due to smaller flushed segments and more merging, 
> but I still wonder whether there's something else. The benchmark runs with 
> 8GB of heap, 2GB of RAM buffer and 36 indexing threads. So it's about 2GB/36 
> = 57MB of RAM buffer per thread in the worst-case scenario that all DWPTs get 
> full at the same time. Stored fields account for about 0.7MB of memory, or 1% 
> of the indexing buffer size. How can a 1% reduction of buffering capacity 
> explain a 10% indexing slowdown? I looked into this further by running 
> indexing benchmarks locally with 8 indexing threads and 128MB of indexing 
> buffer memory, which would make this issue even more apparent if the smaller 
> RAM buffer was the cause, but I'm not seeing a regression and actually I'm 
> seeing similar number of flushes when I disabled memory accounting for stored 
> fields.
> I ran indexing under a profiler to see whether something else could cause 
> this slowdown, e.g. slow implementations of ramBytesUsed on stored fields 
> writers, but nothing surprising showed up and the profile looked just like I 
> would have expected.
> Another question I have is why the 4kB benchmark is not affected at all.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

80 matches

Mail list logo