[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

2020-06-03 Thread GitBox


dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434352523



##
File path: lucene/benchmark/build.gradle
##
@@ -37,5 +37,121 @@ dependencies {
 exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file 
-PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')

Review comment:
   I'd just inline taskAlg into the array for brevity, but it's fine as is 
too.

##
File path: lucene/benchmark/build.gradle
##
@@ -37,5 +37,121 @@ dependencies {
 exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file 
-PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
   Just had a random thought that if you don't redirect to a file the 
process is piped between gradle (parent) and this may cause artificial 
slowdowns on buffers between processes... Don't know if this matters but an 
alternative design could create a temporary file (task class has a method for 
creating task-relative temporary files), redirect the output into that file 
(always) and only pipe it to the console at the end if stdOutStr is not 
defined. 
   
   I really don't know how these benchmarks are used in practice but I wanted 
to signal a potential issue here.

##
File path: lucene/benchmark/build.gradle
##
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
   I think java plugin is more than fine here so remove the comment for the 
final version, maybe?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14520) json.facets: allBucket:true can cause server errors when combined with refine:true

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125082#comment-17125082
 ] 

ASF subversion and git services commented on SOLR-14520:


Commit fb58f433fbed8f961bce88961084202428ef287a in lucene-solr's branch 
refs/heads/master from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=fb58f43 ]

SOLR-14520: Fixed server errors from the json.facet allBuckets:true option when 
combined with refine:true


> json.facets: allBucket:true can cause server errors when combined with 
> refine:true
> --
>
> Key: SOLR-14520
> URL: https://issues.apache.org/jira/browse/SOLR-14520
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14520.patch, SOLR-14520.patch, SOLR-14520.patch
>
>
> Another bug that was discovered while testing SOLR-14467...
> In some situations, using {{allBuckets:true}} in conjunction with 
> {{refine:true}} can cause server errors during the "refinement" requests to 
> the individual shards -- either NullPointerExceptions from some (nested) 
> SlotAccs when SpecialSlotAcc tries to collect them, or 
> ArrayIndexOutOfBoundsException from CountSlotArrAcc.incrementCount because 
> it's asked to collect to "large" slot# values even though it's been 
> initialized with a size of '1'
> NOTE: these problems may be specific to FacetFieldProcessorByArrayDV - i have 
> not yet seen similar failures from FacetFieldProcessorByArrayUIF (those are 
> the only 2 used when doing refinement) but that may just be a fluke of 
> testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

2020-06-03 Thread GitBox


dsmiley commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434727518



##
File path: lucene/benchmark/build.gradle
##
@@ -37,5 +37,121 @@ dependencies {
 exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file 
-PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
   ehh; I'd prefer to keep this the way it is.  The code/scripts in the alg 
files generally don't print tons of output, so I don't think there's a perf 
interference concern.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

2020-06-03 Thread GitBox


dsmiley commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434727998



##
File path: lucene/benchmark/build.gradle
##
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
   I like that this comment spells out a difference from how all the other 
modules are.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14520) json.facets: allBucket:true can cause server errors when combined with refine:true

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125149#comment-17125149
 ] 

ASF subversion and git services commented on SOLR-14520:


Commit bbcd43366e873918b065297654dccfbfc899dc9f in lucene-solr's branch 
refs/heads/branch_8x from Chris M. Hostetter
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=bbcd433 ]

SOLR-14520: Fixed server errors from the json.facet allBuckets:true option when 
combined with refine:true

(cherry picked from commit fb58f433fbed8f961bce88961084202428ef287a)


> json.facets: allBucket:true can cause server errors when combined with 
> refine:true
> --
>
> Key: SOLR-14520
> URL: https://issues.apache.org/jira/browse/SOLR-14520
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14520.patch, SOLR-14520.patch, SOLR-14520.patch
>
>
> Another bug that was discovered while testing SOLR-14467...
> In some situations, using {{allBuckets:true}} in conjunction with 
> {{refine:true}} can cause server errors during the "refinement" requests to 
> the individual shards -- either NullPointerExceptions from some (nested) 
> SlotAccs when SpecialSlotAcc tries to collect them, or 
> ArrayIndexOutOfBoundsException from CountSlotArrAcc.incrementCount because 
> it's asked to collect to "large" slot# values even though it's been 
> initialized with a size of '1'
> NOTE: these problems may be specific to FacetFieldProcessorByArrayDV - i have 
> not yet seen similar failures from FacetFieldProcessorByArrayUIF (those are 
> the only 2 used when doing refinement) but that may just be a fluke of 
> testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14520) json.facets: allBucket:true can cause server errors when combined with refine:true

2020-06-03 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14520:
--
Fix Version/s: 8.6
   master (9.0)
 Assignee: Chris M. Hostetter
   Resolution: Fixed
   Status: Resolved  (was: Patch Available)

Thanks [~mgibney] !

> json.facets: allBucket:true can cause server errors when combined with 
> refine:true
> --
>
> Key: SOLR-14520
> URL: https://issues.apache.org/jira/browse/SOLR-14520
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Assignee: Chris M. Hostetter
>Priority: Major
> Fix For: master (9.0), 8.6
>
> Attachments: SOLR-14520.patch, SOLR-14520.patch, SOLR-14520.patch
>
>
> Another bug that was discovered while testing SOLR-14467...
> In some situations, using {{allBuckets:true}} in conjunction with 
> {{refine:true}} can cause server errors during the "refinement" requests to 
> the individual shards -- either NullPointerExceptions from some (nested) 
> SlotAccs when SpecialSlotAcc tries to collect them, or 
> ArrayIndexOutOfBoundsException from CountSlotArrAcc.incrementCount because 
> it's asked to collect to "large" slot# values even though it's been 
> initialized with a size of '1'
> NOTE: these problems may be specific to FacetFieldProcessorByArrayDV - i have 
> not yet seen similar failures from FacetFieldProcessorByArrayUIF (those are 
> the only 2 used when doing refinement) but that may just be a fluke of 
> testing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14525) For components loaded from packages SolrCoreAware, ResourceLoaderAware are not honored

2020-06-03 Thread Chris M. Hostetter (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125173#comment-17125173
 ] 

Chris M. Hostetter commented on SOLR-14525:
---

On branch_8x, git bisect has identified commit 
e0b7984b140c4ecc9f435a22fd557fbcea30b171 as being the cause of multiple suite 
level failures that reproduce for me regardless of seed...

 
{noformat}
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=TestReplicationHandlerDiskOverFlow -Dtests.seed=33B6ECFD73638B2D 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=pt-BR 
-Dtests.timezone=Cuba -Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   0.00s J2 | TestReplicationHandlerDiskOverFlow (suite) <<<
   [junit4]> Throwable #1: java.lang.AssertionError: ObjectTracker found 1 
object(s) that were not released!!! [InternalHttpClient]
   [junit4]   2> NOTE: reproduce with: ant test  
-Dtestcase=CdcrVersionReplicationTest -Dtests.seed=33B6ECFD73638B2D 
-Dtests.slow=true -Dtests.badapples=true -Dtests.locale=ko-KR 
-Dtests.timezone=Asia/Damascus -Dtests.asserts=true 
-Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   0.00s J3 | CdcrVersionReplicationTest (suite) <<<
   [junit4]> Throwable #1: java.lang.AssertionError: ObjectTracker found 4 
object(s) that were not released!!! [InternalHttpClient, InternalHttpClient, 
InternalHttpClient, InternalHttpClient]
   [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=CdcrBootstrapTest 
-Dtests.seed=33B6ECFD73638B2D -Dtests.slow=true -Dtests.badapples=true 
-Dtests.locale=sr-Latn-RS -Dtests.timezone=America/Danmarkshavn 
-Dtests.asserts=true -Dtests.file.encoding=ISO-8859-1
   [junit4] ERROR   0.00s J0 | CdcrBootstrapTest (suite) <<<
   [junit4]> Throwable #1: java.lang.AssertionError: ObjectTracker found 11 
object(s) that were not released!!! [SolrZkClient, InternalHttpClient, 
ZkStateReader, ZkStateReader,

 {noformat}

> For components loaded from packages SolrCoreAware, ResourceLoaderAware are 
> not honored
> --
>
> Key: SOLR-14525
> URL: https://issues.apache.org/jira/browse/SOLR-14525
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> inform() methods are not invoked if the plugins are loaded from packages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dsmiley commented on a change in pull request #1527: SOLR-14384 Stack SolrRequestInfo

2020-06-03 Thread GitBox


dsmiley commented on a change in pull request #1527:
URL: https://github.com/apache/lucene-solr/pull/1527#discussion_r434752288



##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +56,60 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+Deque stack = threadLocal.get();
+if (stack.isEmpty()) return null;
+return stack.peek();
   }
 
+  /** Adds the SolrRequestInfo onto the stack provided that the stack is not 
reached MAX_STACK_SIZE */
   public static void setRequestInfo(SolrRequestInfo info) {
-// TODO: temporary sanity check... this can be changed to just an assert 
in the future
-SolrRequestInfo prev = threadLocal.get();
-if (prev != null) {
-  log.error("Previous SolrRequestInfo was not closed!  req={}", 
prev.req.getOriginalParams());
-  log.error("prev == info : {}", prev.req == info.req, new 
RuntimeException());
+Deque stack = threadLocal.get();
+if (info == null) {
+  throw new IllegalArgumentException("SolrRequestInfo is null");
+} else {
+  if (stack.size() <= MAX_STACK_SIZE) {
+stack.push(info);
+  } else {
+assert true : "SolrRequestInfo Stack is full";

Review comment:
   assert false

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +56,60 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+Deque stack = threadLocal.get();
+if (stack.isEmpty()) return null;
+return stack.peek();
   }
 
+  /** Adds the SolrRequestInfo onto the stack provided that the stack is not 
reached MAX_STACK_SIZE */
   public static void setRequestInfo(SolrRequestInfo info) {
-// TODO: temporary sanity check... this can be changed to just an assert 
in the future
-SolrRequestInfo prev = threadLocal.get();
-if (prev != null) {
-  log.error("Previous SolrRequestInfo was not closed!  req={}", 
prev.req.getOriginalParams());
-  log.error("prev == info : {}", prev.req == info.req, new 
RuntimeException());
+Deque stack = threadLocal.get();
+if (info == null) {
+  throw new IllegalArgumentException("SolrRequestInfo is null");
+} else {
+  if (stack.size() <= MAX_STACK_SIZE) {
+stack.push(info);
+  } else {
+assert true : "SolrRequestInfo Stack is full";
+log.error("SolrRequestInfo Stack is full");
+  }
 }
-assert prev == null;
-
-threadLocal.set(info);
   }
 
+  /** Removes the most recent SolrRequestInfo from the stack */
   public static void clearRequestInfo() {
-try {
-  SolrRequestInfo info = threadLocal.get();
-  if (info != null && info.closeHooks != null) {
-for (Closeable hook : info.closeHooks) {
-  try {
-hook.close();
-  } catch (Exception e) {
-SolrException.log(log, "Exception during close hook", e);
-  }
+Deque stack = threadLocal.get();
+if (stack.isEmpty()) {
+  log.error("clearRequestInfo called too many times");
+} else {
+  SolrRequestInfo info = stack.pop();
+  closeHooks(info);
+}
+  }
+
+  /**
+   * This reset method is more of a protection mechanism as
+   * we expect it to be empty by now because all "set" calls need to be 
balanced with a "clear".
+   */
+  public static void reset() {
+Deque stack = threadLocal.get();
+boolean isEmpty = stack.isEmpty();
+while (!stack.isEmpty()) {
+  SolrRequestInfo info = stack.pop();
+  closeHooks(info);
+}
+assert isEmpty : "SolrRequestInfo Stack should have been cleared.";
+  }
+
+  private static void closeHooks(SolrRequestInfo info) {
+if (info != null && info.closeHooks != null) {

Review comment:
   but it cannot be null any more?

##
File path: solr/core/src/java/org/apache/solr/request/SolrRequestInfo.java
##
@@ -52,35 +56,60 @@
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
 
   public static SolrRequestInfo getRequestInfo() {
-return threadLocal.get();
+Deque stack = threadLocal.get();
+if (stack.isEmpty()) return null;
+return stack.peek();
   }
 
+  /** Adds the SolrRequestInfo onto the stack provided that the stack is not 
reached MAX_STACK_SIZE */
   public static void setRequestInfo(SolrRequestInfo info) {
-// TODO: temporary sanity check... this can be changed to just an assert 
in the future
-SolrRequestInfo prev = threadLocal.get();
-if (prev != null) {
-  log.error("Previous SolrRequestInfo was not closed!  req={}", 
prev.req.getOriginalParams());
-  log.error("prev == info : {}", prev.req == info.req, new 
RuntimeException());
+Deque stack = threadL

[jira] [Commented] (SOLR-14476) Add percentiles and standard deviation aggregations to stats, facet and timeseries Streaming Expressions

2020-06-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125183#comment-17125183
 ] 

Joel Bernstein commented on SOLR-14476:
---

The commits don't appear on this ticket but this work was committed to master:

[https://github.com/apache/lucene-solr/commit/16aad55369d285fec96425f996984a9f4afe28e4]

[https://github.com/apache/lucene-solr/commit/a795047c6ca54e221c743e78880cd93b752b30fb]

> Add percentiles and standard deviation aggregations to stats, facet and 
> timeseries Streaming Expressions
> 
>
> Key: SOLR-14476
> URL: https://issues.apache.org/jira/browse/SOLR-14476
> Project: Solr
>  Issue Type: New Feature
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch
>
>
> This ticket will add the *per* (percentile) and *std* (standard deviation) 
> aggregations to the *stats*, *facet* and *timeseries* Streaming Expressions. 
> Syntax:
>  
> {code:java}
> facet(logs, buckets="collection_s", per(qtime_i, 50), std(qtime_i)) {code}
> The stats function will also be reimplemented using JSON facets rather than 
> the stats component as part of this ticket. The main reason is that JSON 
> facets syntax is easier to work with for percentiles, but it also 
> standardized are pushed down aggregations to JSON facets.
> In a separate ticket *per* and *std* aggregations will be added to the 
> *rollup*, *hashRollup* and *nodes* Streaming Expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14476) Add percentiles and standard deviation aggregations to stats, facet and timeseries Streaming Expressions

2020-06-03 Thread Joel Bernstein (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125183#comment-17125183
 ] 

Joel Bernstein edited comment on SOLR-14476 at 6/3/20, 6:06 PM:


The commits don't appear on this ticket but this work was committed to master:

[https://github.com/apache/lucene-solr/commit/16aad55369d285fec96425f996984a9f4afe28e4]

[https://github.com/apache/lucene-solr/commit/a795047c6ca54e221c743e78880cd93b752b30fb]

And branch_8x:

[https://github.com/apache/lucene-solr/commit/286b75097fe830593779a1df2bd0eb3897f84089]

[https://github.com/apache/lucene-solr/commit/70de3df047a72f419af257c8c6437d6d5267f917]

[https://github.com/apache/lucene-solr/commit/6ed9cba6d83c94aeaa89ad9fe6fcfcff013fbb14]

 

 


was (Author: joel.bernstein):
The commits don't appear on this ticket but this work was committed to master:

[https://github.com/apache/lucene-solr/commit/16aad55369d285fec96425f996984a9f4afe28e4]

[https://github.com/apache/lucene-solr/commit/a795047c6ca54e221c743e78880cd93b752b30fb]

> Add percentiles and standard deviation aggregations to stats, facet and 
> timeseries Streaming Expressions
> 
>
> Key: SOLR-14476
> URL: https://issues.apache.org/jira/browse/SOLR-14476
> Project: Solr
>  Issue Type: New Feature
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch
>
>
> This ticket will add the *per* (percentile) and *std* (standard deviation) 
> aggregations to the *stats*, *facet* and *timeseries* Streaming Expressions. 
> Syntax:
>  
> {code:java}
> facet(logs, buckets="collection_s", per(qtime_i, 50), std(qtime_i)) {code}
> The stats function will also be reimplemented using JSON facets rather than 
> the stats component as part of this ticket. The main reason is that JSON 
> facets syntax is easier to work with for percentiles, but it also 
> standardized are pushed down aggregations to JSON facets.
> In a separate ticket *per* and *std* aggregations will be added to the 
> *rollup*, *hashRollup* and *nodes* Streaming Expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14476) Add percentiles and standard deviation aggregations to stats, facet and timeseries Streaming Expressions

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125195#comment-17125195
 ] 

ASF subversion and git services commented on SOLR-14476:


Commit 90039fc9bc52b3e648b174ee450f32ca71ae4291 in lucene-solr's branch 
refs/heads/master from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=90039fc ]

SOLR-14476: Add percentiles and standard deviation aggregations to stats, facet 
and timeseries Streaming Expressions


> Add percentiles and standard deviation aggregations to stats, facet and 
> timeseries Streaming Expressions
> 
>
> Key: SOLR-14476
> URL: https://issues.apache.org/jira/browse/SOLR-14476
> Project: Solr
>  Issue Type: New Feature
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch
>
>
> This ticket will add the *per* (percentile) and *std* (standard deviation) 
> aggregations to the *stats*, *facet* and *timeseries* Streaming Expressions. 
> Syntax:
>  
> {code:java}
> facet(logs, buckets="collection_s", per(qtime_i, 50), std(qtime_i)) {code}
> The stats function will also be reimplemented using JSON facets rather than 
> the stats component as part of this ticket. The main reason is that JSON 
> facets syntax is easier to work with for percentiles, but it also 
> standardized are pushed down aggregations to JSON facets.
> In a separate ticket *per* and *std* aggregations will be added to the 
> *rollup*, *hashRollup* and *nodes* Streaming Expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14476) Add percentiles and standard deviation aggregations to stats, facet and timeseries Streaming Expressions

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125202#comment-17125202
 ] 

ASF subversion and git services commented on SOLR-14476:


Commit e327f08adea1c4273043986ab53c18b1f4b97556 in lucene-solr's branch 
refs/heads/branch_8x from Joel Bernstein
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e327f08 ]

SOLR-14476: Add percentiles and standard deviation aggregations to stats, facet 
and timeseries Streaming Expressions


> Add percentiles and standard deviation aggregations to stats, facet and 
> timeseries Streaming Expressions
> 
>
> Key: SOLR-14476
> URL: https://issues.apache.org/jira/browse/SOLR-14476
> Project: Solr
>  Issue Type: New Feature
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Attachments: SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch
>
>
> This ticket will add the *per* (percentile) and *std* (standard deviation) 
> aggregations to the *stats*, *facet* and *timeseries* Streaming Expressions. 
> Syntax:
>  
> {code:java}
> facet(logs, buckets="collection_s", per(qtime_i, 50), std(qtime_i)) {code}
> The stats function will also be reimplemented using JSON facets rather than 
> the stats component as part of this ticket. The main reason is that JSON 
> facets syntax is easier to work with for percentiles, but it also 
> standardized are pushed down aggregations to JSON facets.
> In a separate ticket *per* and *std* aggregations will be added to the 
> *rollup*, *hashRollup* and *nodes* Streaming Expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (SOLR-14476) Add percentiles and standard deviation aggregations to stats, facet and timeseries Streaming Expressions

2020-06-03 Thread Joel Bernstein (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joel Bernstein resolved SOLR-14476.
---
Fix Version/s: 8.6
   Resolution: Resolved

> Add percentiles and standard deviation aggregations to stats, facet and 
> timeseries Streaming Expressions
> 
>
> Key: SOLR-14476
> URL: https://issues.apache.org/jira/browse/SOLR-14476
> Project: Solr
>  Issue Type: New Feature
>  Components: streaming expressions
>Reporter: Joel Bernstein
>Assignee: Joel Bernstein
>Priority: Major
> Fix For: 8.6
>
> Attachments: SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, SOLR-14476.patch, 
> SOLR-14476.patch
>
>
> This ticket will add the *per* (percentile) and *std* (standard deviation) 
> aggregations to the *stats*, *facet* and *timeseries* Streaming Expressions. 
> Syntax:
>  
> {code:java}
> facet(logs, buckets="collection_s", per(qtime_i, 50), std(qtime_i)) {code}
> The stats function will also be reimplemented using JSON facets rather than 
> the stats component as part of this ticket. The main reason is that JSON 
> facets syntax is easier to work with for percentiles, but it also 
> standardized are pushed down aggregations to JSON facets.
> In a separate ticket *per* and *std* aggregations will be added to the 
> *rollup*, *hashRollup* and *nodes* Streaming Expressions.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

2020-06-03 Thread GitBox


dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434805356



##
File path: lucene/benchmark/build.gradle
##
@@ -37,5 +37,121 @@ dependencies {
 exclude module: "xml-apis"
   })
 
+  runtimeOnly project(':lucene:analysis:icu')
+
   testImplementation project(':lucene:test-framework')
 }
+
+def tempDir = file("temp")
+def workDir = file("work")
+
+task run(type: JavaExec) {
+  description "Run a perf test (optional: -PtaskAlg=conf/your-algorithm-file 
-PmaxHeapSize=1G)"
+  main 'org.apache.lucene.benchmark.byTask.Benchmark'
+  classpath sourceSets.main.runtimeClasspath
+  // allow these to be specified on the CLI via -PtaskAlg=  for example
+  def taskAlg = propertyOrDefault('taskAlg', 'conf/micro-standard.alg')
+  args = [taskAlg]
+
+  maxHeapSize = propertyOrDefault('maxHeapSize', '1G')
+
+  String stdOutStr = propertyOrDefault('standardOutput', null)

Review comment:
   Sure.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dweiss commented on a change in pull request #1550: LUCENE-9383: benchmark module: Gradle conversion (complete)

2020-06-03 Thread GitBox


dweiss commented on a change in pull request #1550:
URL: https://github.com/apache/lucene-solr/pull/1550#discussion_r434805283



##
File path: lucene/benchmark/build.gradle
##
@@ -15,13 +15,13 @@
  * limitations under the License.
  */
 
-
-apply plugin: 'java-library'
+apply plugin: 'java'
+// NOT a 'java-library'.  Maybe 'application' but seems too limiting.

Review comment:
   From my (seasoned) gradle viewpoint this comment really doesn't make 
much sense: it's not an "application" in gradle sense - we launch multiple 
classes, have infrastructure in the build file, not a main class etc. But fine 
with me.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Jim Ferenczi (Jira)
Jim Ferenczi created LUCENE-9390:


 Summary: Kuromoji tokenizer discards tokens if they start with a 
punctuation character
 Key: LUCENE-9390
 URL: https://issues.apache.org/jira/browse/LUCENE-9390
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Jim Ferenczi


This issue was first raised in Elasticsearch here.

The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
that mix punctuations and other characters. For instance the following entry:

_(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_

can be found in the Noun.csv file.

Today, tokens that start with punctuations are automatically removed by default 
(discardPunctuation  is true). I think the code was written this way because we 
expect punctuations to be separated from normal tokens but there are exceptions 
in the original dictionary. Maybe we should check the entire token when 
discarding punctuations ?

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9390:
-
Description: 
This issue was first raised in Elasticsearch 
[here|[https://github.com/elastic/elasticsearch/issues/57614]|https://github.com/elastic/elasticsearch/issues/57614]

The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
that mix punctuations and other characters. For instance the following entry:

_(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_

can be found in the Noun.csv file.

Today, tokens that start with punctuations are automatically removed by default 
(discardPunctuation  is true). I think the code was written this way because we 
expect punctuations to be separated from normal tokens but there are exceptions 
in the original dictionary. Maybe we should check the entire token when 
discarding punctuations ?

 

 

  was:
This issue was first raised in Elasticsearch here.

The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
that mix punctuations and other characters. For instance the following entry:

_(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_

can be found in the Noun.csv file.

Today, tokens that start with punctuations are automatically removed by default 
(discardPunctuation  is true). I think the code was written this way because we 
expect punctuations to be separated from normal tokens but there are exceptions 
in the original dictionary. Maybe we should check the entire token when 
discarding punctuations ?

 

 


> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|[https://github.com/elastic/elasticsearch/issues/57614]|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Jim Ferenczi (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jim Ferenczi updated LUCENE-9390:
-
Description: 
This issue was first raised in Elasticsearch 
[here|https://github.com/elastic/elasticsearch/issues/57614]

The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
that mix punctuations and other characters. For instance the following entry:

_(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_

can be found in the Noun.csv file.

Today, tokens that start with punctuations are automatically removed by default 
(discardPunctuation  is true). I think the code was written this way because we 
expect punctuations to be separated from normal tokens but there are exceptions 
in the original dictionary. Maybe we should check the entire token when 
discarding punctuations ?

 

 

  was:
This issue was first raised in Elasticsearch 
[here|[https://github.com/elastic/elasticsearch/issues/57614]|https://github.com/elastic/elasticsearch/issues/57614]

The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
that mix punctuations and other characters. For instance the following entry:

_(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_

can be found in the Noun.csv file.

Today, tokens that start with punctuations are automatically removed by default 
(discardPunctuation  is true). I think the code was written this way because we 
expect punctuations to be separated from normal tokens but there are exceptions 
in the original dictionary. Maybe we should check the entire token when 
discarding punctuations ?

 

 


> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] msokolov opened a new pull request #1552: LUCENE-8962

2020-06-03 Thread GitBox


msokolov opened a new pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552


   This PR revisits the merge-on-commit patch submitted by @msfroh a little 
while ago. The only change from that earlier PR is a fix for failures uncovered 
by TestIndexWriter.testRandomOperations, some whitespace cleanups, and a rebase 
on the current master branch. The problem was that 
updateSegmentInfosOnMergeFinish would incorrectly decRef a merged segments' 
files if that segment was modified by deletions (or updates) while it was being 
merged.
   
   With this fix, I ran the failing test case several thousands of times with 
no failures, whereas before it would routinely fail after a few hundred test 
runs.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-8962) Can we merge small segments during refresh, for faster searching?

2020-06-03 Thread Michael Sokolov (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-8962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125282#comment-17125282
 ] 

Michael Sokolov commented on LUCENE-8962:
-

Posted a new PR that fixes the test failures we were seeing: 
[https://github.com/apache/lucene-solr/pull/1552]  For some reason it's not 
linked above, and I'm not sure how to remedy that

> Can we merge small segments during refresh, for faster searching?
> -
>
> Key: LUCENE-8962
> URL: https://issues.apache.org/jira/browse/LUCENE-8962
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/index
>Reporter: Michael McCandless
>Priority: Major
> Fix For: 8.6
>
> Attachments: LUCENE-8962_demo.png, failed-tests.patch
>
>  Time Spent: 9h 40m
>  Remaining Estimate: 0h
>
> With near-real-time search we ask {{IndexWriter}} to write all in-memory 
> segments to disk and open an {{IndexReader}} to search them, and this is 
> typically a quick operation.
> However, when you use many threads for concurrent indexing, {{IndexWriter}} 
> will accumulate write many small segments during {{refresh}} and this then 
> adds search-time cost as searching must visit all of these tiny segments.
> The merge policy would normally quickly coalesce these small segments if 
> given a little time ... so, could we somehow improve {{IndexWriter'}}s 
> refresh to optionally kick off merge policy to merge segments below some 
> threshold before opening the near-real-time reader?  It'd be a bit tricky 
> because while we are waiting for merges, indexing may continue, and new 
> segments may be flushed, but those new segments shouldn't be included in the 
> point-in-time segments returned by refresh ...
> One could almost do this on top of Lucene today, with a custom merge policy, 
> and some hackity logic to have the merge policy target small segments just 
> written by refresh, but it's tricky to then open a near-real-time reader, 
> excluding newly flushed but including newly merged segments since the refresh 
> originally finished ...
> I'm not yet sure how best to solve this, so I wanted to open an issue for 
> discussion!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (SOLR-14534) Investigate cleaning up any remaining warnings in 8x

2020-06-03 Thread Erick Erickson (Jira)
Erick Erickson created SOLR-14534:
-

 Summary: Investigate cleaning up any remaining warnings in 8x
 Key: SOLR-14534
 URL: https://issues.apache.org/jira/browse/SOLR-14534
 Project: Solr
  Issue Type: Sub-task
Reporter: Erick Erickson


There will be some divergence between master and 8x. The current pattern is
1> clean up warnings in master
2> backport to 8x and insure all tests etc run.

Conspicuously missing is compiling under 8x and insuring that there are no 
warnings in the cleaned code.

I'm not sure I really will do this if it turns out there are a lot of them. 
It's good enough that master is (and stay) clean IMO. OTOH, if it only takes a 
short time. Won't be able to tell until we get code clean.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1548: SOLR-14524: Harden MultiThreadedOCPTest testFillWorkQueue()

2020-06-03 Thread GitBox


madrob commented on a change in pull request #1548:
URL: https://github.com/apache/lucene-solr/pull/1548#discussion_r434826708



##
File path: solr/core/src/test/org/apache/solr/cloud/MultiThreadedOCPTest.java
##
@@ -77,42 +76,68 @@ private void testFillWorkQueue() throws Exception {
 distributedQueue.offer(Utils.toJSON(Utils.makeMap(
 "collection", "A_COLL",
 QUEUE_OPERATION, MOCK_COLL_TASK.toLower(),
-ASYNC, String.valueOf(i),
+ASYNC, Integer.toString(i),
 
-"sleep", (i == 0 ? "1000" : "1") //first task waits for 1 second, 
and thus blocking
-// all other tasks. Subsequent tasks only wait for 1ms
+// third task waits for a long time, and thus blocks the queue for 
all other tasks for A_COLL.
+// Subsequent tasks as well as the first two only wait for 1ms
+"sleep", (i == 2 ? "1" : "1")
 )));
 log.info("MOCK task added {}", i);
-
   }
-  Thread.sleep(100);//wait and post the next message
 
-  //this is not going to be blocked because it operates on another 
collection
+  // Wait until we see the second A_COLL task getting processed (assuming 
the first got processed as well)
+  Long task1CollA = waitForTaskToCompleted(client, 1);
+
+  assertNotNull("Queue did not process first two tasks on A_COLL, can't 
run test", task1CollA);
+
+  // Make sure the long running task did not finish, otherwise no way the 
B_COLL task can be tested to run in parallel with it
+  assertNull("Long running task finished too early, can't test", 
checkTaskHasCompleted(client, 2));
+
+  // Enqueue a task on another collection not competing with the lock on 
A_COLL and see that it can be executed right away
   distributedQueue.offer(Utils.toJSON(Utils.makeMap(
   "collection", "B_COLL",
   QUEUE_OPERATION, MOCK_COLL_TASK.toLower(),
   ASYNC, "200",
   "sleep", "1"
   )));
 
+  // We now check that either the B_COLL task has completed before the 
third (long running) task on A_COLL,
+  // Or if both have completed (if this check got significantly delayed 
for some reason), we verify B_COLL was first.
+  Long taskCollB = waitForTaskToCompleted(client, 200);
 
-  Long acoll = null, bcoll = null;
-  for (int i = 0; i < 500; i++) {
-if (bcoll == null) {
-  CollectionAdminResponse statusResponse = getStatusResponse("200", 
client);
-  bcoll = (Long) statusResponse.getResponse().get("MOCK_FINISHED");
-}
-if (acoll == null) {
-  CollectionAdminResponse statusResponse = getStatusResponse("2", 
client);
-  acoll = (Long) statusResponse.getResponse().get("MOCK_FINISHED");
-}
-if (acoll != null && bcoll != null) break;
-Thread.sleep(100);
+  // We do not wait for the long running task to finish, that would be a 
waste of time.
+  Long task2CollA = checkTaskHasCompleted(client, 2);
+
+  // Given the wait delay (500 iterations of 100ms), the task has plenty 
of time to complete, so this is not expected.
+  assertNotNull("Task on  B_COLL did not complete, can't test", taskCollB);
+  // We didn't wait for the 3rd A_COLL task to complete (test can run 
quickly) but if it did, we expect the B_COLL to have finished first.
+  assertTrue("task2CollA: " + task2CollA + " taskCollB: " + taskCollB, 
task2CollA  == null || task2CollA > taskCollB);
+}
+  }
+
+  /**
+   * Verifies the status of an async task submitted to the Overseer Collection 
queue.
+   * @return null if the task has not completed, the completion 
timestamp if the task has completed
+   * (see {@link 
org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler#mockOperation}).

Review comment:
   nit: javadoc complains about this not being a visible reference





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1548: SOLR-14524: Harden MultiThreadedOCPTest testFillWorkQueue()

2020-06-03 Thread GitBox


madrob commented on pull request #1548:
URL: https://github.com/apache/lucene-solr/pull/1548#issuecomment-638436905


   LGTM, one minor nit. if you can take care of that please, I'll be happy to 
merge.
   
   cc: @ErickErickson 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] ErickErickson commented on pull request #1548: SOLR-14524: Harden MultiThreadedOCPTest testFillWorkQueue()

2020-06-03 Thread GitBox


ErickErickson commented on pull request #1548:
URL: https://github.com/apache/lucene-solr/pull/1548#issuecomment-638438610


   Thanks, Mike, I’ll leave it in your capable hands. And thanks again Ilan...
   
   > On Jun 3, 2020, at 4:14 PM, Mike Drob  wrote:
   > 
   > 
   > LGTM, one minor nit. if you can take care of that please, I'll be happy to 
merge.
   > 
   > cc: @ErickErickson
   > 
   > —
   > You are receiving this because you were mentioned.
   > Reply to this email directly, view it on GitHub, or unsubscribe.
   > 
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9365) Fuzzy query has a false negative when prefix length == search term length

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125284#comment-17125284
 ] 

ASF subversion and git services commented on LUCENE-9365:
-

Commit 45611d0647b860700e2ebd52c7c4695027c5c890 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=45611d0 ]

LUCENE-9365 FuzzyQuery false negative when prefix length == search term length 
(#1545)

Co-Authored-By: markharwood 

> Fuzzy query has a false negative when prefix length == search term length 
> --
>
> Key: LUCENE-9365
> URL: https://issues.apache.org/jira/browse/LUCENE-9365
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Reporter: Mark Harwood
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using FuzzyQuery the search string `bba` does not match doc value `bbab` 
> with an edit distance of 1 and prefix length of 3.
> In FuzzyQuery an automaton is created for the "suffix" part of the search 
> string which in this case is an empty string.
> In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of 
> the following form :
> {code:java}
> searchString + "?" 
> {code}
> .. where there's an appropriate number of ? characters according to the edit 
> distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1545: LUCENE-9365 FuzzyQuery false negative

2020-06-03 Thread GitBox


madrob merged pull request #1545:
URL: https://github.com/apache/lucene-solr/pull/1545


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-9365) Fuzzy query has a false negative when prefix length == search term length

2020-06-03 Thread Mike Drob (Jira)


 [ 
https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved LUCENE-9365.
---
Fix Version/s: master (9.0)
 Assignee: Mike Drob
   Resolution: Fixed

> Fuzzy query has a false negative when prefix length == search term length 
> --
>
> Key: LUCENE-9365
> URL: https://issues.apache.org/jira/browse/LUCENE-9365
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Reporter: Mark Harwood
>Assignee: Mike Drob
>Priority: Major
> Fix For: master (9.0)
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using FuzzyQuery the search string `bba` does not match doc value `bbab` 
> with an edit distance of 1 and prefix length of 3.
> In FuzzyQuery an automaton is created for the "suffix" part of the search 
> string which in this case is an empty string.
> In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of 
> the following form :
> {code:java}
> searchString + "?" 
> {code}
> .. where there's an appropriate number of ? characters according to the edit 
> distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9365) Fuzzy query has a false negative when prefix length == search term length

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125287#comment-17125287
 ] 

ASF subversion and git services commented on LUCENE-9365:
-

Commit 58958c9531baef80663503c365345fc36d4e1d79 in lucene-solr's branch 
refs/heads/master from Mike Drob
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=58958c9 ]

LUCENE-9365 CHANGES.txt


> Fuzzy query has a false negative when prefix length == search term length 
> --
>
> Key: LUCENE-9365
> URL: https://issues.apache.org/jira/browse/LUCENE-9365
> Project: Lucene - Core
>  Issue Type: Bug
>  Components: core/query/scoring
>Reporter: Mark Harwood
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> When using FuzzyQuery the search string `bba` does not match doc value `bbab` 
> with an edit distance of 1 and prefix length of 3.
> In FuzzyQuery an automaton is created for the "suffix" part of the search 
> string which in this case is an empty string.
> In this scenario maybe the FuzzyQuery should rewrite to a WildcardQuery of 
> the following form :
> {code:java}
> searchString + "?" 
> {code}
> .. where there's an appropriate number of ? characters according to the edit 
> distance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on a change in pull request #1539: Fix typos in release wizard

2020-06-03 Thread GitBox


madrob commented on a change in pull request #1539:
URL: https://github.com/apache/lucene-solr/pull/1539#discussion_r434833772



##
File path: dev-tools/scripts/releaseWizard.yaml
##
@@ -1491,13 +1496,13 @@ groups:
 cmd: ant clean
   - !Command
 cmd: python3 -u dev-tools/scripts/addBackcompatIndexes.py --no-cleanup 
 --temp-dir {{ temp_dir }} {{ release_version }}  && git add 
lucene/backward-codecs/src/test/org/apache/lucene/index/
-logfile: add-bakccompat.log
+logfile: add-backcompat.log
   - !Command
-cmd: git diff
+cmd: git diff --staged
 comment: Check the git diff before committing
 tee: true
   - !Command
-cmd: git add -u .  && git commit -m "Add back-compat indices for {{ 
release_version }}"  && git push
+cmd: git commit -m "Add back-compat indices for {{ release_version }}" 
 && git push

Review comment:
   Because you already do `git add` on line 1498.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob merged pull request #1539: Fix typos in release wizard

2020-06-03 Thread GitBox


madrob merged pull request #1539:
URL: https://github.com/apache/lucene-solr/pull/1539


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-12823) remove clusterstate.json in Lucene/Solr 8.0

2020-06-03 Thread Erick Erickson (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-12823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125295#comment-17125295
 ] 

Erick Erickson commented on SOLR-12823:
---

[~murblanc] I'll try to look at this Real Soon Now unless someone beats me to 
it.

> remove clusterstate.json in Lucene/Solr 8.0
> ---
>
> Key: SOLR-12823
> URL: https://issues.apache.org/jira/browse/SOLR-12823
> Project: Solr
>  Issue Type: Task
>Reporter: Varun Thacker
>Priority: Major
>  Time Spent: 3h 50m
>  Remaining Estimate: 0h
>
> clusterstate.json is an artifact of a pre 5.0 Solr release. We should remove 
> that in 8.0
> It stays empty unless you explicitly ask to create the collection with the 
> old "stateFormat" and there is no reason for one to create a collection with 
> the old stateFormat.
> We should also remove the "stateFormat" argument in create collection
> We should also remove MIGRATESTATEVERSION as well
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] madrob commented on pull request #1492: SOLR-11934: Visit Solr logging, it's too noisy.

2020-06-03 Thread GitBox


madrob commented on pull request #1492:
URL: https://github.com/apache/lucene-solr/pull/1492#issuecomment-638449947


   @ErickErickson There's no changes here. Stale PR?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

2020-06-03 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14467:
--
Status: Patch Available  (was: Open)

> inconsistent server errors combining relatedness() with allBuckets:true
> ---
>
> Key: SOLR-14467
> URL: https://issues.apache.org/jira/browse/SOLR-14467
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14467.patch, SOLR-14467.patch, SOLR-14467.patch, 
> SOLR-14467.patch, SOLR-14467_allBuckets_refine.patch, SOLR-14467_test.patch, 
> SOLR-14467_test.patch, beast.log.txt, beast2.log.txt
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature of the failures it looks like it may have something to do 
> with multiple segments of different sizes, and or resizing the SlotAccs ?
> The relatedness() function doesn't have much (any?) existing tests in place 
> that leverage "allBuckets" so this is probably a bug that has always existed 
> -- it's possible it may be excessively cumbersome to fix and we might 
> nee/wnat to just document that incompatibility and add some code to try and 
> detect if the user combines these options and if so fail with a 400 error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Updated] (SOLR-14467) inconsistent server errors combining relatedness() with allBuckets:true

2020-06-03 Thread Chris M. Hostetter (Jira)


 [ 
https://issues.apache.org/jira/browse/SOLR-14467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris M. Hostetter updated SOLR-14467:
--
Attachment: SOLR-14467.patch
Status: Open  (was: Open)

bq. ...  I think that would let us remove a lot of the special casing of 
allBuckets in terms of merging? .. like i said, I need to think it through more 
– i don't wnat to try and simplify/refactor any of this until test beasting 
seems solid.

Now that SOLR-14520 is fixed and the tests seemed solid, i took a crack at this 
idea, see updated patch.

I think it's a lot cleaner/simpler then having the special BucketData singlton 
for allBuckets --  what do you think [~mgibney], any concens?

> inconsistent server errors combining relatedness() with allBuckets:true
> ---
>
> Key: SOLR-14467
> URL: https://issues.apache.org/jira/browse/SOLR-14467
> Project: Solr
>  Issue Type: Bug
>  Components: Facet Module
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-14467.patch, SOLR-14467.patch, SOLR-14467.patch, 
> SOLR-14467.patch, SOLR-14467_allBuckets_refine.patch, SOLR-14467_test.patch, 
> SOLR-14467_test.patch, beast.log.txt, beast2.log.txt
>
>
> While working on randomized testing for SOLR-13132 i discovered a variety of 
> different ways that JSON Faceting's "allBuckets" option can fail when 
> combined with the "relatedness()" function.
> I haven't found a trivial way to manual reproduce this, but i have been able 
> to trigger the failures with a trivial patch to {{TestCloudJSONFacetSKG}} 
> which i will attach.
> Based on the nature of the failures it looks like it may have something to do 
> with multiple segments of different sizes, and or resizing the SlotAccs ?
> The relatedness() function doesn't have much (any?) existing tests in place 
> that leverage "allBuckets" so this is probably a bug that has always existed 
> -- it's possible it may be excessively cumbersome to fix and we might 
> nee/wnat to just document that incompatibility and add some code to try and 
> detect if the user combines these options and if so fail with a 400 error?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14525) For components loaded from packages SolrCoreAware, ResourceLoaderAware are not honored

2020-06-03 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125398#comment-17125398
 ] 

Noble Paul commented on SOLR-14525:
---

seems like this affected only 8x and not master

> For components loaded from packages SolrCoreAware, ResourceLoaderAware are 
> not honored
> --
>
> Key: SOLR-14525
> URL: https://issues.apache.org/jira/browse/SOLR-14525
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> inform() methods are not invoked if the plugins are loaded from packages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14525) For components loaded from packages SolrCoreAware, ResourceLoaderAware are not honored

2020-06-03 Thread Noble Paul (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125428#comment-17125428
 ] 

Noble Paul commented on SOLR-14525:
---

cherry-pick screwed up. Master was right and cherry-pick to 8x did it wrong

> For components loaded from packages SolrCoreAware, ResourceLoaderAware are 
> not honored
> --
>
> Key: SOLR-14525
> URL: https://issues.apache.org/jira/browse/SOLR-14525
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> inform() methods are not invoked if the plugins are loaded from packages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Created] (LUCENE-9391) Upgrade to HPPC 0.8.2

2020-06-03 Thread Haoyu Zhai (Jira)
Haoyu Zhai created LUCENE-9391:
--

 Summary: Upgrade to HPPC 0.8.2
 Key: LUCENE-9391
 URL: https://issues.apache.org/jira/browse/LUCENE-9391
 Project: Lucene - Core
  Issue Type: Improvement
Reporter: Haoyu Zhai


HPPC 0.8.2 is out and exposes an Accountable-like interface using to estimate 
the memory usage.

[https://issues.carrot2.org/secure/ReleaseNote.jspa?projectId=10070&version=13522&styleName=Text]

We should upgrade to that if any of components using hppc need to estimate 
memory better.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14525) For components loaded from packages SolrCoreAware, ResourceLoaderAware are not honored

2020-06-03 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125459#comment-17125459
 ] 

ASF subversion and git services commented on SOLR-14525:


Commit 5827ddf2fae664a5c014a42a95db14dd2f3cbbf9 in lucene-solr's branch 
refs/heads/branch_8x from Noble Paul
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=5827ddf ]

SOLR-14525: chery pick from master did it wrong


> For components loaded from packages SolrCoreAware, ResourceLoaderAware are 
> not honored
> --
>
> Key: SOLR-14525
> URL: https://issues.apache.org/jira/browse/SOLR-14525
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: packages
>Reporter: Noble Paul
>Assignee: Noble Paul
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> inform() methods are not invoked if the plugins are loaded from packages



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-13458) Make Jetty timeouts configurable system wide

2020-06-03 Thread Alexander Zhideev (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-13458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125471#comment-17125471
 ] 

Alexander Zhideev commented on SOLR-13458:
--

[~gus] were you able to find any sort of work around these intermittent 
timeouts? We are seeing the same exact issue in Solr Cloud 7.7.1 which surfaced 
closer to the end of the project, but so far there is no real solution proposed 
to us on this. Sometimes it times out and other times same exact query is 
running with no issues. Any tips / pointers would be greatly appreciated. Even 
something unconventional like infinite retries or anything else...

> Make Jetty timeouts configurable system wide
> 
>
> Key: SOLR-13458
> URL: https://issues.apache.org/jira/browse/SOLR-13458
> Project: Solr
>  Issue Type: Sub-task
>  Components: SolrCloud
>Affects Versions: master (9.0)
>Reporter: Gus Heck
>Priority: Major
>
> Our jetty container has several timeouts associated with it, and at least one 
> of these is regularly getting in my way (the idle timeout after 120 sec). I 
> tried setting a system property, with no effect and I've tried altering a 
> jetty.xml found at solr-install/solr/server/etc/jetty.xml on all (50) 
> machines and rebooting all servers only to have an exception with the old 120 
> sec timeout still show up. This ticket proposes that these values are by 
> nature "Global System Timeouts" and should be made configurable in solr.xml 
> (which may be difficult because they will be needed early in the boot 
> sequence). 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[GitHub] [lucene-solr] dnhatn commented on pull request #1552: LUCENE-8962

2020-06-03 Thread GitBox


dnhatn commented on pull request #1552:
URL: https://github.com/apache/lucene-solr/pull/1552#issuecomment-638557223


   @s1monw Can you please take a look at this PR? You already left some 
[comments](https://issues.apache.org/jira/browse/LUCENE-8962?focusedCommentId=17053231&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17053231)
 for it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-06-03 Thread Daniel Lowe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125487#comment-17125487
 ] 

Daniel Lowe commented on SOLR-14518:


I also had encountered a need for this functionality (issue linked). 
uniqueShard would to me be an intuitive name for this functionality.

In my actual use case my data happens to be in blocks, and I wanted the (exact) 
unique count of values in a child document field, where some of the child 
documents may have the same value for the field, but values of the field in one 
block never appear in any other block (and by extension also never appear in 
any other shard). Would uniqueBlock(field) help with that?

{{}}

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (SOLR-14518) Add support for partitioned unique agg to JSON facets

2020-06-03 Thread Daniel Lowe (Jira)


[ 
https://issues.apache.org/jira/browse/SOLR-14518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125487#comment-17125487
 ] 

Daniel Lowe edited comment on SOLR-14518 at 6/4/20, 2:38 AM:
-

I also had encountered a need for this functionality (issue linked). 
uniqueShard would to me be an intuitive name for this functionality.

In my actual use case my data happens to be in blocks, and I wanted the (exact) 
unique count of values in a child document field, where some of the child 
documents may have the same value for the field, but values of the field in one 
block never appear in any other block (and by extension also never appear in 
any other shard). Would uniqueBlock(field) help with that?


was (Author: dan2097):
I also had encountered a need for this functionality (issue linked). 
uniqueShard would to me be an intuitive name for this functionality.

In my actual use case my data happens to be in blocks, and I wanted the (exact) 
unique count of values in a child document field, where some of the child 
documents may have the same value for the field, but values of the field in one 
block never appear in any other block (and by extension also never appear in 
any other shard). Would uniqueBlock(field) help with that?

{{}}

> Add support for partitioned unique agg to JSON facets
> -
>
> Key: SOLR-14518
> URL: https://issues.apache.org/jira/browse/SOLR-14518
> Project: Solr
>  Issue Type: New Feature
>  Components: Facet Module
>Reporter: Joel Bernstein
>Priority: Major
>
> There are scenarios where documents are partitioned across shards based on 
> the same field that the *unique* agg is applied to with JSON facets. In this 
> scenario exact unique counts can be calculated by simply sending the bucket 
> level unique counts to the aggregator where they can be summed. Suggested 
> syntax is to add a boolean flag to the unique aggregation function: 
> *unique*(partitioned_field, true).
> The *true* value turns on the "partitioned" unique logic. The default is 
> false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125542#comment-17125542
 ] 

Tomoko Uchida commented on LUCENE-9390:
---

Personally, I usually set the "discardPunctuation" flag to False to avoid such 
subtle situation.

As a possible solution, instead of "discardPunctuation" flag we could add a 
token filter to discard tokens that remove all tokens which is composed only of 
punctuation characters after tokenization (just like stop filter) ? To me, it 
is a token filter's job rather than a tokenizer...

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Comment Edited] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Tomoko Uchida (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125542#comment-17125542
 ] 

Tomoko Uchida edited comment on LUCENE-9390 at 6/4/20, 4:54 AM:


Personally, I usually set the "discardPunctuation" flag to False to avoid such 
subtle situation.

As a possible solution, instead of "discardPunctuation" flag we could add a 
token filter to discard all tokens which is composed only of punctuation 
characters after tokenization (just like stop filter) ? To me, it is a token 
filter's job rather than a tokenizer...


was (Author: tomoko uchida):
Personally, I usually set the "discardPunctuation" flag to False to avoid such 
subtle situation.

As a possible solution, instead of "discardPunctuation" flag we could add a 
token filter to discard tokens that remove all tokens which is composed only of 
punctuation characters after tokenization (just like stop filter) ? To me, it 
is a token filter's job rather than a tokenizer...

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[jira] [Commented] (LUCENE-9390) Kuromoji tokenizer discards tokens if they start with a punctuation character

2020-06-03 Thread Jun Ohtani (Jira)


[ 
https://issues.apache.org/jira/browse/LUCENE-9390?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125547#comment-17125547
 ] 

Jun Ohtani commented on LUCENE-9390:


IMO, we remove the flag and the kuromoji outputs punctuation characters 
(includes the token starting punctuation characters). 

Then we can handle tokens with token filter. I think we can use the part of 
speech token filter to remove such tokens.

> Kuromoji tokenizer discards tokens if they start with a punctuation character
> -
>
> Key: LUCENE-9390
> URL: https://issues.apache.org/jira/browse/LUCENE-9390
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Jim Ferenczi
>Priority: Minor
>
> This issue was first raised in Elasticsearch 
> [here|https://github.com/elastic/elasticsearch/issues/57614]
> The unidic dictionary that is used by the Kuromoji tokenizer contains entries 
> that mix punctuations and other characters. For instance the following entry:
> _(株),1285,1285,3690,名詞,一般,*,*,*,*,(株),カブシキガイシャ,カブシキガイシャ_
> can be found in the Noun.csv file.
> Today, tokens that start with punctuations are automatically removed by 
> default (discardPunctuation  is true). I think the code was written this way 
> because we expect punctuations to be separated from normal tokens but there 
> are exceptions in the original dictionary. Maybe we should check the entire 
> token when discarding punctuations ?
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org