date:20201208

[GitHub] [lucene-solr] iverase opened a new pull request #2131: LUCENE-9552: make sure we don't construct Illegal rectangles due to quantization

2020-12-08 Thread GitBox



iverase opened a new pull request #2131:
URL: https://github.com/apache/lucene-solr/pull/2131


   This commit just add a correction in the minLat/maxLat when it becomes 
invalid during quantisation.
   
   CC: @nknize 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15010) Missing jstack warning is alarming, when using bin/solr as client interface to solr

2020-12-08 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-15010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245741#comment-17245741
 ] 

Jan Høydahl commented on SOLR-15010:


+1 to fallback to jattach in bin/solr if jstack is not found. See 
https://github.com/apangin/jattach.

> Missing jstack warning is alarming, when using bin/solr as client interface 
> to solr
> ---
>
> Key: SOLR-15010
> URL: https://issues.apache.org/jira/browse/SOLR-15010
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.7
>Reporter: David Eric Pugh
>Priority: Minor
>
> In SOLR-14442 we added a warning if jstack wasn't found.   I notice that I 
> use the bin/solr command a lot as a client, so bin solr zk or bin solr 
> healthcheck. 
> For example:
> {{docker exec solr1 solr zk cp /security.json zk:security.json -z zoo1:2181}}
> All of these emit the message:
> The currently defined JAVA_HOME (/usr/local/openjdk-11) refers to a location
> where java was found but jstack was not found. Continuing.
> This is somewhat alarming, and then becomes annoying.   Thoughts on maybe 
> only conducting this check if you are running {{bin/solr start}} or one of 
> the other commands that is actually starting Solr as a process?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] romseygeek commented on a change in pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



romseygeek commented on a change in pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#discussion_r538157532



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchRegionRetriever.java
##
@@ -361,6 +374,41 @@ public void testIntervalQueries() throws IOException {
 );
   }
 
+  @Test
+  public void testDegenerateIntervalsWithPositions() throws IOException {
+testDegenerateIntervals(FLD_TEXT_POS);
+  }
+
+  @Test @AwaitsFix(bugUrl = 
"https://issues.apache.org/jira/browse/LUCENE-9634: " +

Review comment:
   So `extend` will widen the bounds of an interval's positions, but leave 
its offsets untouched (because it has no way of knowing what the offsets 
actually are).  I sort of think that just highlighting the original term is the 
correct behaviour?  But there will be a discrepancy when we generate offsets 
directly from the token stream by comparing to positions.
   
   I see that ExtendedIntervalIterator's javadoc is incorrect regarding 
prefixes.  It says 
   ```
   An interval with prefix bounds extended by n will skip over matches that 
appear in positions lower than n
   ```
   but it actually just readjusts these matches to start at position 0.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman commented on pull request #2130: Adding Apache Reporter step in Release Wizard.

2020-12-08 Thread GitBox



HoustonPutman commented on pull request #2130:
URL: https://github.com/apache/lucene-solr/pull/2130#issuecomment-740486427


   Thanks for checking on that Anshum



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] HoustonPutman merged pull request #2130: Adding Apache Reporter step in Release Wizard.

2020-12-08 Thread GitBox



HoustonPutman merged pull request #2130:
URL: https://github.com/apache/lucene-solr/pull/2130


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-8673) o.a.s.search.facet classes not public/extendable

2020-12-08 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245758#comment-17245758
 ] 

ASF subversion and git services commented on SOLR-8673:
---

Commit 6f357af0c10e0dc3d84cbef4a48fe2ba0b566d7d in lucene-solr's branch 
refs/heads/branch_8x from Mikhail Khludnev
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=6f357af ]

SOLR-8673: fix build.


> o.a.s.search.facet classes not public/extendable
> 
>
> Key: SOLR-8673
> URL: https://issues.apache.org/jira/browse/SOLR-8673
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 5.4.1
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 6.2, 7.0
>
> Attachments: SOLR-8673.patch, SOLR-8673.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It is not easy to create a custom JSON facet function. A simple function 
> based on AvgAgg quickly results in the following compilation failures:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) 
> on project openindex-solr: Compilation failure: Compilation failure:
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[22,36]
>  org.apache.solr.search.facet.FacetContext is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[23,36]
>  org.apache.solr.search.facet.FacetDoubleMerger is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[40,32]
>  cannot find symbol
> [ERROR] symbol:   class FacetContext
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[49,39]
>  cannot find symbol
> [ERROR] symbol:   class FacetDoubleMerger
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[54,43]
>  cannot find symbol
> [ERROR] symbol:   class Context
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg.Merger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[41,16]
>  cannot find symbol
> [ERROR] symbol:   class AvgSlotAcc
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[46,12]
>  incompatible types: i.o.s.search.facet.CustomAvgAgg.Merger cannot be 
> converted to org.apache.solr.search.facet.FacetMerger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[53,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[60,5]
>  method does not override or implement a method from a supertype
> {code}
> It seems lots of classes are tucked away in FacetModule, which we can't reach 
> from outside.
> Originates from this thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3ccab_8yd9ldbg_0zxm_h1igkfm6bqeypd5ilyy7tty8cztscv...@mail.gmail.com%3E
>  ( also available at 
> https://lists.apache.org/thread.html/9fddcad3136ec908ce1c57881f8d3069e5d153f08b71f80f3e18d995%401455019826%40%3Csolr-user.lucene.apache.org%3E
>  )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



dweiss commented on a change in pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#discussion_r538162712



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchRegionRetriever.java
##
@@ -361,6 +374,41 @@ public void testIntervalQueries() throws IOException {
 );
   }
 
+  @Test
+  public void testDegenerateIntervalsWithPositions() throws IOException {
+testDegenerateIntervals(FLD_TEXT_POS);
+  }
+
+  @Test @AwaitsFix(bugUrl = 
"https://issues.apache.org/jira/browse/LUCENE-9634: " +

Review comment:
   > I sort of think that just highlighting the original term is the 
correct behaviour?
   
   Hmm... I don't think I agree. When you have a query parser that allows 
intervals then extend becomes a function just like anything else. The intuitive 
user expectation for a query extend(foo 2 2) is to actually highlight the 
matching interval of positions  (well, users think of "words") pointed to by 
that interval. This is particularly important if you're building more complex 
expressions out of these (left/ right/ extend, etc.) and you wish to see 
partial fragments as you're building more focused expressions.
   
   I'm not saying this has to be fixed (neither do I know how it should) but 
it's real feedback from people who use those queries intensively (and my gut 
feeling agrees).





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] romseygeek commented on a change in pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



romseygeek commented on a change in pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#discussion_r538220443



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchRegionRetriever.java
##
@@ -361,6 +374,41 @@ public void testIntervalQueries() throws IOException {
 );
   }
 
+  @Test
+  public void testDegenerateIntervalsWithPositions() throws IOException {
+testDegenerateIntervals(FLD_TEXT_POS);
+  }
+
+  @Test @AwaitsFix(bugUrl = 
"https://issues.apache.org/jira/browse/LUCENE-9634: " +

Review comment:
   Fair enough! I originally added `extend` to deal with stopwords and to 
help implement `before` and `after` filters, but if it's being used elsewhere 
then that's all good.
   
   I'm interested in how it's being exposed in query parsers - we don't 
actually have it as an option in the elasticsearch intervals DSL but maybe we 
ought to add it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15034) CoreAdmin STATUS should also return config set

2020-12-08 Thread Andreas Hubold (Jira)

Andreas Hubold created SOLR-15034:
-

 Summary: CoreAdmin STATUS should also return config set
 Key: SOLR-15034
 URL: https://issues.apache.org/jira/browse/SOLR-15034
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
Affects Versions: 8.6.3
Reporter: Andreas Hubold


Currently, the CoreAdmin STATUS response does not return the config set of the 
core. It would be nice if it could be included in the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15034) CoreAdmin STATUS should also return config set

2020-12-08 Thread Andreas Hubold (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245836#comment-17245836
 ] 

Andreas Hubold commented on SOLR-15034:
---

I've asked on solr-user mailing list: 
[https://mail-archives.apache.org/mod_mbox/lucene-solr-user/202012.mbox/%3Ca77c6c99-a62b-0b4b-e63d-4dc851814f34%40coremedia.com%3E]

> CoreAdmin STATUS should also return config set
> --
>
> Key: SOLR-15034
> URL: https://issues.apache.org/jira/browse/SOLR-15034
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Affects Versions: 8.6.3
>Reporter: Andreas Hubold
>Priority: Major
>
> Currently, the CoreAdmin STATUS response does not return the config set of 
> the core. It would be nice if it could be included in the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-8673) o.a.s.search.facet classes not public/extendable

2020-12-08 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245843#comment-17245843
 ] 

Mikhail Khludnev commented on SOLR-8673:


[https://builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1027/testReport/org.apache.solr.search.function/AggValueSourceTest/]

Fixed.

> o.a.s.search.facet classes not public/extendable
> 
>
> Key: SOLR-8673
> URL: https://issues.apache.org/jira/browse/SOLR-8673
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 5.4.1
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 6.2, 7.0
>
> Attachments: SOLR-8673.patch, SOLR-8673.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It is not easy to create a custom JSON facet function. A simple function 
> based on AvgAgg quickly results in the following compilation failures:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) 
> on project openindex-solr: Compilation failure: Compilation failure:
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[22,36]
>  org.apache.solr.search.facet.FacetContext is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[23,36]
>  org.apache.solr.search.facet.FacetDoubleMerger is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[40,32]
>  cannot find symbol
> [ERROR] symbol:   class FacetContext
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[49,39]
>  cannot find symbol
> [ERROR] symbol:   class FacetDoubleMerger
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[54,43]
>  cannot find symbol
> [ERROR] symbol:   class Context
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg.Merger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[41,16]
>  cannot find symbol
> [ERROR] symbol:   class AvgSlotAcc
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[46,12]
>  incompatible types: i.o.s.search.facet.CustomAvgAgg.Merger cannot be 
> converted to org.apache.solr.search.facet.FacetMerger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[53,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[60,5]
>  method does not override or implement a method from a supertype
> {code}
> It seems lots of classes are tucked away in FacetModule, which we can't reach 
> from outside.
> Originates from this thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3ccab_8yd9ldbg_0zxm_h1igkfm6bqeypd5ilyy7tty8cztscv...@mail.gmail.com%3E
>  ( also available at 
> https://lists.apache.org/thread.html/9fddcad3136ec908ce1c57881f8d3069e5d153f08b71f80f3e18d995%401455019826%40%3Csolr-user.lucene.apache.org%3E
>  )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



dweiss commented on a change in pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#discussion_r538313908



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchRegionRetriever.java
##
@@ -361,6 +374,41 @@ public void testIntervalQueries() throws IOException {
 );
   }
 
+  @Test
+  public void testDegenerateIntervalsWithPositions() throws IOException {
+testDegenerateIntervals(FLD_TEXT_POS);
+  }
+
+  @Test @AwaitsFix(bugUrl = 
"https://issues.apache.org/jira/browse/LUCENE-9634: " +

Review comment:
   It is extremely useful to capture and drill down in the context of 
another query. Let's say apples nearby oranges. Yes, you can achieve a similar 
thing with other queries but it's pretty useful on its own (because you can 
first inspect the context you're looking at by running the extends query in 
isolation).
   
   I've modified flexible query parser and added those functions as 
prefix-scoped "language". Looks like this:
   https://get.carrotsearch.com/lingo4g/1.12.0-SNAPSHOT/doc/#interval-functions
   
   And combine with the matches highlighter it really shines. It's the best 
when you get multiple overlapping intervals; I don't have an example won this 
computer (I have a day of on home duties) but I can send you one later on - you 
can do some really impressive stuff with intervals!





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2123: SOLR-10732: short circuit calls to searcher#numDocs when base is empty

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2123:
URL: https://github.com/apache/lucene-solr/pull/2123#discussion_r538315749



##
File path: solr/core/src/java/org/apache/solr/search/facet/FacetProcessor.java
##
@@ -419,7 +419,7 @@ void fillBucket(SimpleOrderedMap bucket, Query q, 
DocSet result, boolean
   }
   count = result.size();  // don't really need this if we are skipping, 
but it's free.
 } else {
-  if (q == null) {
+  if (q == null || fcontext.base.size() == 0) {

Review comment:
   This is done 
https://github.com/apache/lucene-solr/pull/2123/commits/c194e09ca0d2df32acf21875c7625f9e862fdc09





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2123: SOLR-10732: short circuit calls to searcher#numDocs when base is empty

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2123:
URL: https://github.com/apache/lucene-solr/pull/2123#discussion_r538316824



##
File path: solr/core/src/java/org/apache/solr/request/SimpleFacets.java
##
@@ -903,7 +910,7 @@ public void execute(Runnable r) {
 
   private int numDocs(String term, final SchemaField sf, final FieldType ft, 
final DocSet baseDocset) {
 try {
-  return searcher.numDocs(ft.getFieldQuery(null, sf, term), baseDocset);
+  return baseDocset.size() == 0? 0: 
searcher.numDocs(ft.getFieldQuery(null, sf, term), baseDocset);

Review comment:
   sort by count won't be done if baseDocSet size is 0 but I have kept this 
check in numDocs so that, any future usage can benefit from it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



dweiss commented on pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#issuecomment-740595732


   I plan to commit it in (with assume-disabled test involving 
position+offsets) if nobody objects.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2123: SOLR-10732: short circuit calls to searcher#numDocs when base is empty

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2123:
URL: https://github.com/apache/lucene-solr/pull/2123#discussion_r538323161



##
File path: solr/core/src/java/org/apache/solr/request/SimpleFacets.java
##
@@ -325,6 +329,9 @@ public void getFacetQueryCount(ParsedParams parsed, 
NamedList res) thro
* @see FacetParams#FACET_QUERY
*/
   public int getGroupedFacetQueryCount(Query facetQuery, DocSet docSet) throws 
IOException {
+if (docSet.size() == 0) {
+  return 0;
+}

Review comment:
   Same as above





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14397) Vector Search in Solr

2020-12-08 Thread Alessandro Benedetti (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245867#comment-17245867
 ] 

Alessandro Benedetti commented on SOLR-14397:
-

Should we resume this work, now that 
https://issues.apache.org/jira/browse/LUCENE-9004 has been officially merged to 
master?
I read it superficially and I have not yet explored the code, but the 
aforementioned contribution seems quite relevant, potentially is now the right 
time to redefine the design?

> Vector Search in Solr
> -
>
> Key: SOLR-14397
> URL: https://issues.apache.org/jira/browse/SOLR-14397
> Project: Solr
>  Issue Type: Improvement
>Reporter: Trey Grainger
>Priority: Major
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Search engines have traditionally relied upon token-based matching (typically 
> keywords) on an inverted index, plus relevance ranking based upon keyword 
> occurrence statistics. This can be viewed as a "sparse vector” match (where 
> each term is a one-hot encoded dimension in the vector), since only a few 
> keywords out of all possible keywords are considered in each query. With the 
> introduction of deep-learning-based transformers over the last few years, 
> however, the state of the art in relevance has moved to ranking models based 
> upon dense vectors that encode a latent, semantic understanding of both 
> language constructs and the underlying domain upon which the model was 
> trained. These dense vectors are also referred to as “embeddings”. An example 
> of this kind of embedding would be taking the phrase “chief executive officer 
> of the tech company” and converting it to [0.03, 1.7, 9.12, 0, 0.3]
>  . Other similar phrases should encode to vectors with very similar numbers, 
> so we may expect a query like “CEO of a technology org” to generate a vector 
> like [0.1, 1.9, 8.9, 0.1, 0.4]. When performing a cosine similarity 
> calculation between these vectors, we would expect a number closer to 1.0, 
> whereas a very unrelated text blurb would generate a much smaller cosine 
> similarity.
> This is a proposal for how we should implement these vector search 
> capabilities in Solr.
> h1. Search Process Overview:
> In order to implement dense vector search, the following process is typically 
> followed:
> h2. Offline:
> An encoder is built. An encoder can take in text (a query, a sentence, a 
> paragraph, a document, etc.) and return a dense vector representing that 
> document in a rich semantic space. The semantic space is learned from 
> training on textual data (usually, though other sources work, too), typically 
> from the domain of the search engine.
> h2. Document Ingestion:
> When documents are processed, they are passed to the encoder, and the dense 
> vector(s) returned are stored as fields on the document. There could be one 
> or more vectors per-document, as the granularity of the vectors could be 
> per-document, per field, per paragraph, per-sentence, or even per phrase or 
> per term.
> h2. Query Time:
> *Encoding:* The query is translated to a dense vector by passing it to the 
> encoder
>  Quantization: The query is quantized. Quantization is the process of taking 
> a vector with many values and turning it into “terms” in a vector space that 
> approximates the full vector space of the dense vectors.
>  *ANN Matching:* A query on the quantized vector tokens is executed as an ANN 
> (approximate nearest neighbor) search. This allows finding most of the best 
> matching documents (typically up to 95%) with a traditional and efficient 
> lookup against the inverted index.
>  _(optional)_ *ANN Ranking*: ranking may be performed based upon the matched 
> quantized tokens to get a rough, initial ranking of documents based upon the 
> similarity of the query and document vectors. This allows the next step 
> (re-ranking) to be performed on a smaller subset of documents. 
>  *Re-Ranking:* Once the initial matching (and optionally ANN ranking) is 
> performed, a similarity calculation (cosine, dot-product, or any number of 
> other calculations) is typically performed between the full (non-quantized) 
> dense vectors for the query and those in the document. This re-ranking will 
> typically be on the top-N results for performance reasons.
>  *Return Results:* As with any search, the final step is typically to return 
> the results in relevance-ranked order. In this case, that would be sorted by 
> the re-ranking similarity score (i.e. “cosine descending”).
>  --
> *Variant:* For small document sets, it may be preferable to rank all 
> documents and skip steps steps 2, 3, and 4. This is because ANN Matching 
> typically reduces recall (current state of the art is around 95% recall), so 
> it can be beneficial to rank all documents if performance is not a concern. 
> In thi

[jira] [Commented] (LUCENE-9629) Use computed mask values in ForUtil

2020-12-08 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245883#comment-17245883
 ] 

Feng Guo commented on LUCENE-9629:
--

[~jpountz] Sorry to bother you! now we have come into a new week, can you 
please help merge this PR?

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (LUCENE-9629) Use computed mask values in ForUtil

2020-12-08 Thread Feng Guo (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9629?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245883#comment-17245883
 ] 

Feng Guo edited comment on LUCENE-9629 at 12/8/20, 1:29 PM:


[~jpountz] Sorry to bother you! now we have come into a new week, could you 
please help merge this PR?


was (Author: gf2121):
[~jpountz] Sorry to bother you! now we have come into a new week, can you 
please help merge this PR?

> Use computed mask values in ForUtil
> ---
>
> Key: LUCENE-9629
> URL: https://issues.apache.org/jira/browse/LUCENE-9629
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/codecs
>Reporter: Feng Guo
>Priority: Minor
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> In the class ForkUtil, mask values have been computed and stored in static 
> final vailables, but they are recomputed for every encoding, which may be 
> unnecessary. 
> anther small fix is that change
> {code:java}
> remainingBitsPerValue > remainingBitsPerLong{code}
>  to
> {code:java}
> remainingBitsPerValue >= remainingBitsPerLong{code}
> otherwise
> {code:java}
> if (remainingBitsPerValue == 0) {
>  idx++;
>  remainingBitsPerValue = bitsPerValue; 
> }
> {code}
> these code will never be used.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-8673) o.a.s.search.facet classes not public/extendable

2020-12-08 Thread Mikhail Khludnev (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17245843#comment-17245843
 ] 

Mikhail Khludnev edited comment on SOLR-8673 at 12/8/20, 1:49 PM:
--

[https://builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1027/testReport/org.apache.solr.search.function/AggValueSourceTest/]

https://builds.apache.org/job/Lucene/job/Lucene-Solr-NightlyTests-master/lastCompletedBuild/testReport/org.apache.solr.search.function/AggValueSourceTest/

Fixed.


was (Author: mkhludnev):
[https://builds.apache.org/job/Lucene/job/Lucene-Solr-Tests-8.x/1027/testReport/org.apache.solr.search.function/AggValueSourceTest/]

Fixed.

> o.a.s.search.facet classes not public/extendable
> 
>
> Key: SOLR-8673
> URL: https://issues.apache.org/jira/browse/SOLR-8673
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 5.4.1
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 6.2, 7.0
>
> Attachments: SOLR-8673.patch, SOLR-8673.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It is not easy to create a custom JSON facet function. A simple function 
> based on AvgAgg quickly results in the following compilation failures:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) 
> on project openindex-solr: Compilation failure: Compilation failure:
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[22,36]
>  org.apache.solr.search.facet.FacetContext is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[23,36]
>  org.apache.solr.search.facet.FacetDoubleMerger is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[40,32]
>  cannot find symbol
> [ERROR] symbol:   class FacetContext
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[49,39]
>  cannot find symbol
> [ERROR] symbol:   class FacetDoubleMerger
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[54,43]
>  cannot find symbol
> [ERROR] symbol:   class Context
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg.Merger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[41,16]
>  cannot find symbol
> [ERROR] symbol:   class AvgSlotAcc
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[46,12]
>  incompatible types: i.o.s.search.facet.CustomAvgAgg.Merger cannot be 
> converted to org.apache.solr.search.facet.FacetMerger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[53,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[60,5]
>  method does not override or implement a method from a supertype
> {code}
> It seems lots of classes are tucked away in FacetModule, which we can't reach 
> from outside.
> Originates from this thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3ccab_8yd9ldbg_0zxm_h1igkfm6bqeypd5ilyy7tty8cztscv...@mail.gmail.com%3E
>  ( also available at 
> https://lists.apache.org/thread.html/9fddcad3136ec908ce1c57881f8d3069e5d153f08b71f80f3e18d995%401455019826%40%3Csolr-user.lucene.apache.org%3E
>  )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-8673) o.a.s.search.facet classes not public/extendable

2020-12-08 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-8673:
---
Fix Version/s: (was: 6.2)
   (was: 7.0)
   8.8

> o.a.s.search.facet classes not public/extendable
> 
>
> Key: SOLR-8673
> URL: https://issues.apache.org/jira/browse/SOLR-8673
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 5.4.1
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 8.8
>
> Attachments: SOLR-8673.patch, SOLR-8673.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It is not easy to create a custom JSON facet function. A simple function 
> based on AvgAgg quickly results in the following compilation failures:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) 
> on project openindex-solr: Compilation failure: Compilation failure:
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[22,36]
>  org.apache.solr.search.facet.FacetContext is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[23,36]
>  org.apache.solr.search.facet.FacetDoubleMerger is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[40,32]
>  cannot find symbol
> [ERROR] symbol:   class FacetContext
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[49,39]
>  cannot find symbol
> [ERROR] symbol:   class FacetDoubleMerger
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[54,43]
>  cannot find symbol
> [ERROR] symbol:   class Context
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg.Merger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[41,16]
>  cannot find symbol
> [ERROR] symbol:   class AvgSlotAcc
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[46,12]
>  incompatible types: i.o.s.search.facet.CustomAvgAgg.Merger cannot be 
> converted to org.apache.solr.search.facet.FacetMerger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[53,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[60,5]
>  method does not override or implement a method from a supertype
> {code}
> It seems lots of classes are tucked away in FacetModule, which we can't reach 
> from outside.
> Originates from this thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3ccab_8yd9ldbg_0zxm_h1igkfm6bqeypd5ilyy7tty8cztscv...@mail.gmail.com%3E
>  ( also available at 
> https://lists.apache.org/thread.html/9fddcad3136ec908ce1c57881f8d3069e5d153f08b71f80f3e18d995%401455019826%40%3Csolr-user.lucene.apache.org%3E
>  )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-8673) o.a.s.search.facet classes not public/extendable

2020-12-08 Thread Mikhail Khludnev (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-8673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Khludnev updated SOLR-8673:
---
  Assignee: Mikhail Khludnev
Resolution: Fixed
Status: Resolved  (was: Patch Available)

> o.a.s.search.facet classes not public/extendable
> 
>
> Key: SOLR-8673
> URL: https://issues.apache.org/jira/browse/SOLR-8673
> Project: Solr
>  Issue Type: Improvement
>  Components: Facet Module
>Affects Versions: 5.4.1
>Reporter: Markus Jelsma
>Assignee: Mikhail Khludnev
>Priority: Major
> Fix For: 8.8
>
> Attachments: SOLR-8673.patch, SOLR-8673.patch
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> It is not easy to create a custom JSON facet function. A simple function 
> based on AvgAgg quickly results in the following compilation failures:
> {code}
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) 
> on project openindex-solr: Compilation failure: Compilation failure:
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[22,36]
>  org.apache.solr.search.facet.FacetContext is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[23,36]
>  org.apache.solr.search.facet.FacetDoubleMerger is not public in 
> org.apache.solr.search.facet; cannot be accessed from outside package
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[40,32]
>  cannot find symbol
> [ERROR] symbol:   class FacetContext
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[49,39]
>  cannot find symbol
> [ERROR] symbol:   class FacetDoubleMerger
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[54,43]
>  cannot find symbol
> [ERROR] symbol:   class Context
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg.Merger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[41,16]
>  cannot find symbol
> [ERROR] symbol:   class AvgSlotAcc
> [ERROR] location: class i.o.s.search.facet.CustomAvgAgg
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[46,12]
>  incompatible types: i.o.s.search.facet.CustomAvgAgg.Merger cannot be 
> converted to org.apache.solr.search.facet.FacetMerger
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[53,5]
>  method does not override or implement a method from a supertype
> [ERROR] 
> /home/markus/projects/openindex/solr/trunk/src/main/java/i.o.s.search/facet/CustomAvgAgg.java:[60,5]
>  method does not override or implement a method from a supertype
> {code}
> It seems lots of classes are tucked away in FacetModule, which we can't reach 
> from outside.
> Originates from this thread: 
> http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201602.mbox/%3ccab_8yd9ldbg_0zxm_h1igkfm6bqeypd5ilyy7tty8cztscv...@mail.gmail.com%3E
>  ( also available at 
> https://lists.apache.org/thread.html/9fddcad3136ec908ce1c57881f8d3069e5d153f08b71f80f3e18d995%401455019826%40%3Csolr-user.lucene.apache.org%3E
>  )



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15035) core.properties different when using ADDREPLICA .vs. when the replica created with CREATE

2020-12-08 Thread Erick Erickson (Jira)

Erick Erickson created SOLR-15035:
-

 Summary: core.properties different when using ADDREPLICA .vs. when 
the replica created with CREATE
 Key: SOLR-15035
 URL: https://issues.apache.org/jira/browse/SOLR-15035
 Project: Solr
  Issue Type: Bug
  Security Level: Public (Default Security Level. Issues are Public)
  Components: SolrCloud
Affects Versions: 8.7
Reporter: Erick Erickson


I verified this after seeing it on the user's list. Here are the 
core.properties files:

Note that numShards is missing from the replica created with ADDREPLICA.

If anyone picks this up, we there are lots of places in 
{color:#00}TestCollectionAPI {color} that add a replica that could reach 
out to the core.properties files and check.

What's not clear to me is whether numShards _should_ be in core.properties, but 
whether or not that's the case, we should be consistent.

 

-Core created via CREATE

#Written by CorePropertiesLocator
#Tue Dec 08 14:01:13 UTC 2020
coreNodeName=core_node3
collection.configName=_default
name=blivet_shard1_replica_n1
numShards=2
shard=shard1
collection=blivet
replicaType=NRT
[branch_8x] ~/apache/solr/solrtest8/solr/example/cloud/node1/solr$ cat 
blivet_shard1_replica_n5/core.properties

 

-Core created via ADDREPLICA
#Written by CorePropertiesLocator
#Tue Dec 08 14:01:20 UTC 2020
coreNodeName=core_node6
collection.configName=_default
name=blivet_shard1_replica_n5
shard=shard1
collection=blivet
replicaType=NRT



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gf2121 commented on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 commented on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-740674712


   > I wonder if the difference in performance is observable since final long 
values would be inlined at compile time (and easily optimized for hotspot) 
whereas array accesses, even if locally cached, still have to be dynamic (I 
don't think the compiler is smart enough to detect constant array values?).
   
   Hi @dweiss ! Thers days I did some more benchmarks on this issue and get 
some 'amazing' result...
   First, i randomly choosed a decode method `decode15`, and try to find out if 
it will be slower in an array case. Here is the benchmark code based on JMH:
   ```
   @State(Scope.Benchmark)
   public class MyBenchmark {
   private static final long MASK16_1 = 0x0001000100010001L;
   private static final long[] MASKS16_1 = new long[] {MASK16_1};
   private static final long[] TMP = new long[128];
   private static final long[] ARR = new long[128];
   
   static {
   for (int i=0;i<128;i++) {
   TMP[i] = ARR[i] = i;
   }
   }
   
   public static void main(String[] args) throws RunnerException {
   Options opt = new OptionsBuilder()
   .include("MyBenchmark")
   .build();
   
   new Runner(opt).run();
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode1() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
   l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
   l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
   l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
   l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
   l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
   l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
   l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
   l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
   l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
   l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
   l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
   l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
   l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
   l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode2() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & 0x0001000100010001L) << 14;
   l0 |= (TMP[tmpIdx+1] & 0x0001000100010001L) << 13;
   l0 |= (TMP[tmpIdx+2] & 0x0001000100010001L) << 12;
   l0 |= (TMP[tmpIdx+3] & 0x0001000100010001L) << 11;
   l0 |= (TMP[tmpIdx+4] & 0x0001000100010001L) << 10;
   l0 |= (TMP[tmpIdx+5] & 0x0001000100010001L) << 9;
   l0 |= (TMP[tmpIdx+6] & 0x0001000100010001L) << 8;
   l0 |= (TMP[tmpIdx+7] & 0x0001000100010001L) << 7;
   l0 |= (TMP[tmpIdx+8] & 0x0001000100010001L) << 6;
   l0 |= (TMP[tmpIdx+9] & 0x0001000100010001L) << 5;
   l0 |= (TMP[tmpIdx+10] & 0x00010

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-740674712


   > I wonder if the difference in performance is observable since final long 
values would be inlined at compile time (and easily optimized for hotspot) 
whereas array accesses, even if locally cached, still have to be dynamic (I 
don't think the compiler is smart enough to detect constant array values?).
   
   Hi @dweiss ! Thers days I did some more benchmarks on this issue and get 
some 'amazing' result...
   First, i randomly choosed a decode method `decode15`, and try to find out if 
it will be slower in an array case. Here is the benchmark code based on JMH:
   ```
   @State(Scope.Benchmark)
   public class MyBenchmark {
   private static final long MASK16_1 = 0x0001000100010001L;
   private static final long[] MASKS16_1 = new long[] {MASK16_1};
   private static final long[] TMP = new long[128];
   private static final long[] ARR = new long[128];
   
   static {
   for (int i=0;i<128;i++) {
   TMP[i] = ARR[i] = i;
   }
   }
   
   public static void main(String[] args) throws RunnerException {
   Options opt = new OptionsBuilder()
   .include("MyBenchmark")
   .build();
   
   new Runner(opt).run();
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode1() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
   l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
   l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
   l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
   l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
   l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
   l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
   l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
   l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
   l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
   l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
   l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
   l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
   l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
   l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode2() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & 0x0001000100010001L) << 14;
   l0 |= (TMP[tmpIdx+1] & 0x0001000100010001L) << 13;
   l0 |= (TMP[tmpIdx+2] & 0x0001000100010001L) << 12;
   l0 |= (TMP[tmpIdx+3] & 0x0001000100010001L) << 11;
   l0 |= (TMP[tmpIdx+4] & 0x0001000100010001L) << 10;
   l0 |= (TMP[tmpIdx+5] & 0x0001000100010001L) << 9;
   l0 |= (TMP[tmpIdx+6] & 0x0001000100010001L) << 8;
   l0 |= (TMP[tmpIdx+7] & 0x0001000100010001L) << 7;
   l0 |= (TMP[tmpIdx+8] & 0x0001000100010001L) << 6;
   l0 |= (TMP[tmpIdx+9] & 0x0001000100010001L) << 5;
   l0 |= (TMP[tmpIdx+10] &

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-740674712


   > I wonder if the difference in performance is observable since final long 
values would be inlined at compile time (and easily optimized for hotspot) 
whereas array accesses, even if locally cached, still have to be dynamic (I 
don't think the compiler is smart enough to detect constant array values?).
   
   Hi @dweiss ! Thers days I did some more benchmarks on this issue and get 
some 'amazing' result...
   First, i randomly choosed a decode method `decode15`, and try to find out if 
it will be slower in an array case. Here is the benchmark code based on JMH:
   ```
   @State(Scope.Benchmark)
   public class MyBenchmark {
   private static final long MASK16_1 = 0x0001000100010001L;
   private static final long[] MASKS16_1 = new long[] {MASK16_1};
   private static final long[] TMP = new long[128];
   private static final long[] ARR = new long[128];
   
   static {
   for (int i=0;i<128;i++) {
   TMP[i] = ARR[i] = i;
   }
   }
   
   public static void main(String[] args) throws RunnerException {
   Options opt = new OptionsBuilder()
   .include("MyBenchmark")
   .build();
   
   new Runner(opt).run();
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode1() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
   l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
   l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
   l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
   l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
   l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
   l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
   l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
   l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
   l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
   l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
   l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
   l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
   l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
   l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode2() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & 0x0001000100010001L) << 14;
   l0 |= (TMP[tmpIdx+1] & 0x0001000100010001L) << 13;
   l0 |= (TMP[tmpIdx+2] & 0x0001000100010001L) << 12;
   l0 |= (TMP[tmpIdx+3] & 0x0001000100010001L) << 11;
   l0 |= (TMP[tmpIdx+4] & 0x0001000100010001L) << 10;
   l0 |= (TMP[tmpIdx+5] & 0x0001000100010001L) << 9;
   l0 |= (TMP[tmpIdx+6] & 0x0001000100010001L) << 8;
   l0 |= (TMP[tmpIdx+7] & 0x0001000100010001L) << 7;
   l0 |= (TMP[tmpIdx+8] & 0x0001000100010001L) << 6;
   l0 |= (TMP[tmpIdx+9] & 0x0001000100010001L) << 5;
   l0 |= (TMP[tmpIdx+10] &

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-740674712


   > I wonder if the difference in performance is observable since final long 
values would be inlined at compile time (and easily optimized for hotspot) 
whereas array accesses, even if locally cached, still have to be dynamic (I 
don't think the compiler is smart enough to detect constant array values?).
   
   Hi @dweiss ! Thers days I did some more benchmarks on this issue and get 
some 'amazing' result which i want to share with you :)
   First, i randomly choosed a decode method `decode15`, and try to find out if 
it will be slower in an array case. Here is the benchmark code based on JMH:
   ```
   @State(Scope.Benchmark)
   public class MyBenchmark {
   private static final long MASK16_1 = 0x0001000100010001L;
   private static final long[] MASKS16_1 = new long[] {MASK16_1};
   private static final long[] TMP = new long[128];
   private static final long[] ARR = new long[128];
   
   static {
   for (int i=0;i<128;i++) {
   TMP[i] = ARR[i] = i;
   }
   }
   
   public static void main(String[] args) throws RunnerException {
   Options opt = new OptionsBuilder()
   .include("MyBenchmark")
   .build();
   
   new Runner(opt).run();
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode1() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASK16_1) << 14;
   l0 |= (TMP[tmpIdx+1] & MASK16_1) << 13;
   l0 |= (TMP[tmpIdx+2] & MASK16_1) << 12;
   l0 |= (TMP[tmpIdx+3] & MASK16_1) << 11;
   l0 |= (TMP[tmpIdx+4] & MASK16_1) << 10;
   l0 |= (TMP[tmpIdx+5] & MASK16_1) << 9;
   l0 |= (TMP[tmpIdx+6] & MASK16_1) << 8;
   l0 |= (TMP[tmpIdx+7] & MASK16_1) << 7;
   l0 |= (TMP[tmpIdx+8] & MASK16_1) << 6;
   l0 |= (TMP[tmpIdx+9] & MASK16_1) << 5;
   l0 |= (TMP[tmpIdx+10] & MASK16_1) << 4;
   l0 |= (TMP[tmpIdx+11] & MASK16_1) << 3;
   l0 |= (TMP[tmpIdx+12] & MASK16_1) << 2;
   l0 |= (TMP[tmpIdx+13] & MASK16_1) << 1;
   l0 |= (TMP[tmpIdx+14] & MASK16_1) << 0;
   ARR[longsIdx+0] = l0;
   }
   }
   
   @Benchmark
   @BenchmarkMode({Mode.Throughput})
   @Fork(1)
   @Measurement(iterations = 10, time = 1, timeUnit = TimeUnit.SECONDS)
   @Warmup(iterations = 3, time = 1, timeUnit = TimeUnit.SECONDS)
   public static void decode2() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & 0x0001000100010001L) << 14;
   l0 |= (TMP[tmpIdx+1] & 0x0001000100010001L) << 13;
   l0 |= (TMP[tmpIdx+2] & 0x0001000100010001L) << 12;
   l0 |= (TMP[tmpIdx+3] & 0x0001000100010001L) << 11;
   l0 |= (TMP[tmpIdx+4] & 0x0001000100010001L) << 10;
   l0 |= (TMP[tmpIdx+5] & 0x0001000100010001L) << 9;
   l0 |= (TMP[tmpIdx+6] & 0x0001000100010001L) << 8;
   l0 |= (TMP[tmpIdx+7] & 0x0001000100010001L) << 7;
   l0 |= (TMP[tmpIdx+8] & 0x0001000100010001L) << 6;
   l0 |= (TMP[tmpIdx+9] & 0x0001000100010001L) << 5;

[jira] [Created] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Timothy Potter (Jira)

Timothy Potter created SOLR-15036:
-

 Summary: Use plist automatically for executing a facet expression 
against a collection alias backed by multiple collections
 Key: SOLR-15036
 URL: https://issues.apache.org/jira/browse/SOLR-15036
 Project: Solr
  Issue Type: Improvement
  Security Level: Public (Default Security Level. Issues are Public)
  Components: streaming expressions
Reporter: Timothy Potter
Assignee: Timothy Potter
 Attachments: relay-approach.patch

For analytics use cases, streaming expressions make it possible to compute 
basic aggregations (count, min, max, sum, and avg) over massive data sets. 
Moreover, with massive data sets, it is common to use collection aliases over 
many underlying collections, for instance time-partitioned aliases backed by a 
set of collections, each covering a specific time range. In some cases, we can 
end up with many collections (think 50-60) each with 100's of shards. Aliases 
help insulate client applications from complex collection topologies on the 
server side.

Let's take a basic facet expression that computes some useful aggregation 
metrics:
{code:java}
facet(
  some_alias, 
  q="*:*", 
  fl="a_i", 
  sort="a_i asc", 
  buckets="a_i", 
  bucketSorts="count(*) asc", 
  bucketSizeLimit=1, 
  sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
)
{code}
Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr which 
then expands the alias to a list of collections. For each collection, the 
top-level distributed query controller gathers a candidate set of replicas to 
query and then scatters {{distrib=false}} queries to each replica in the list. 
For instance, if we have 60 collections with 200 shards each, then this results 
in 12,000 shard requests from the query controller node to the other nodes in 
the cluster. The requests are sent in an async manner (see {{SearchHandler}} 
and {{HttpShardHandler}}) In my testing, we’ve seen cases where we hit 18,000 
replicas and these queries don’t always come back in a timely manner. Put 
simply, this also puts a lot of load on the top-level query controller node in 
terms of open connections and new object creation.

Instead, we can use {{plist}} to send the JSON facet query to each collection 
in the alias in parallel, which reduces the overhead of each top-level 
distributed query from 12,000 to 200 in my example above. With this approach, 
you’ll then need to sort the tuples back from each collection and do a rollup, 
something like:
{code:java}
select(
  rollup(
sort(
  plist(
select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
  ),
  by="a_i asc"
),
over="a_i",
sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
  ),
  a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
the_min, max(the_max) as the_max, sum(cnt) as cnt
)
{code}
One thing to point out is that you can’t just avg. the averages back from each 
collection in the rollup. It needs to be a *weighted avg.* when rolling up the 
avg. from each facet expression in the plist. However, we have the count per 
collection, so this is doable but will require some changes to the rollup 
expression to support weighted average.

While this plist approach is doable, it’s a pain for users to have to create 
the rollup / sort over plist expression for collection aliases. After all, 
aliases are supposed to hide these types of complexities from client 
applications!

The point of this ticket is to investigate the feasibility of auto-wrapping the 
facet expression with a rollup / sort / plist when the collection argument is 
an alias with multiple collections; other stream sources will be considered 
after facet is proven out.

Lastly, I also considered an alternative approach of doing a parallel relay on 
the server side. The idea is similar to {{plist}} but instead of this being 
driven on the client side, the {{FacetModule}} can create intermediate queries 
(I called them {{relay}} queries in my impl.) that help distribute the load. In 
my example above, there would be 60 such relay queries, each sent to a replica 
for each collection in the alias, which then sends the {{distrib=false}} 
queries to each replica. The relay query response handler collects the facet 
responses from each replica before sending back to the top-level query

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Atri Sharma (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246017#comment-17246017
 ] 

Atri Sharma commented on SOLR-15036:


I havent looked at the patch yet – but why not do a drill expression and wrap 
it with the aggregate to be computed? Would that not achieve the objective to 
push down aggregation to shards?

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression for collection aliases. After all, 
> aliases are supposed to hide these types of complexities from client 
> applications!
> The point of this ticket is to investigate the feasibility of auto-wrapping 
> the facet expression with a rollup / sort / plist when the collection 
> argument is an alias with multiple collections; other stream sources will be 
> consid

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Timothy Potter (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246031#comment-17246031
 ] 

Timothy Potter commented on SOLR-15036:
---

[~atri] the patch isn't the solution I'm going for, as I explained in the 
description ...

 

Regarding drill, I'll have to investigate its performance compared to {{plist}} 
and {{facet}}. However, since it's based on {{/export}} seems like it would be 
a lot of I/O out each Solr node instead of just relying on the efficient JSON 
facet implementation? I certainly don't want to {{/export}} 1B rows to count 
them when I can just facet instead. What would a {{drill}} expression look like 
that does the same as my example in the description?

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the rollup / sort over plist expression fo

[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-08 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246106#comment-17246106
 ] 

Michael Gibney commented on SOLR-10732:
---

I'm curious, [~munendrasn] -- were you able to perceive a performance benefit 
with these changes? Where these optimizations are located, afaict they optimize 
edge cases, and the query-building they prevent (if I'm reading right) is 
generally pretty lightweight (e.g., {{TermQuery}} ...).

It seems like it makes most sense to optimize this kind of thing either at the 
leaf level (i.e., in {{SolrIndexSearcher.numDocs(...)}} -- already done in 
SOLR-10727) or maybe also higher up in the program logic, to prune as much 
execution as possible (and when it's clearer how/why we got the point of having 
an empty domain). The changes here seem to be building in mid-level "shot in 
the dark" safeguards, where it's relatively unclear what's going on.

By way of contrast (wrt complexity/benefit tradeoff), at the leaf level it 
looks like {{SolrIndexSearcher.getDocSet(Query, DocSet)}} could be optimized in 
a way analogous to what SOLR-10727 does for {{SolrIndexSearcher.numDocs(Query, 
DocSet)}}, avoiding filterCache pollution ...

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dweiss commented on a change in pull request #2127: LUCENE-9633: Improve match highlighter behavior for degenerate intervals

2020-12-08 Thread GitBox



dweiss commented on a change in pull request #2127:
URL: https://github.com/apache/lucene-solr/pull/2127#discussion_r538796480



##
File path: 
lucene/highlighter/src/test/org/apache/lucene/search/matchhighlight/TestMatchRegionRetriever.java
##
@@ -361,6 +374,41 @@ public void testIntervalQueries() throws IOException {
 );
   }
 
+  @Test
+  public void testDegenerateIntervalsWithPositions() throws IOException {
+testDegenerateIntervals(FLD_TEXT_POS);
+  }
+
+  @Test @AwaitsFix(bugUrl = 
"https://issues.apache.org/jira/browse/LUCENE-9634: " +

Review comment:
   I may provide a PR with those query parser changes I made if there's 
interest - they're not that difficult and they make it possible to use 
intervals from plain text queries. I'll get to it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-08 Thread GitBox



madrob commented on a change in pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r538804398



##
File path: 
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java
##
@@ -143,6 +147,15 @@ public SolrInputDocument merge(final SolrInputDocument 
fromDoc, SolrInputDocumen
 return toDoc;
   }
 
+  private static String getID(SolrInputDocument doc, IndexSchema schema) {
+String id = "";

Review comment:
   can we default to `(unknown id)` otherwise the error message will look 
weird I think.

##
File path: 
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java
##
@@ -553,7 +574,15 @@ private Object getNativeFieldValue(String fieldName, 
Object val) {
   return val;
 }
 SchemaField sf = schema.getField(fieldName);
-return sf.getType().toNativeType(val);
+try {
+  return sf.getType().toNativeType(val);
+} catch (SolrException ex) {
+  throw new SolrException(SolrException.ErrorCode.getErrorCode(ex.code()),
+  "Error converting field '" + sf.getName() + "'='" +val+"' to native 
type, msg=" + ex.getMessage(), ex);

Review comment:
   I don't think we want `msg` copied since it will be in the cause anyway.

##
File path: 
solr/core/src/test/org/apache/solr/update/TestInPlaceUpdatesStandalone.java
##
@@ -121,6 +123,36 @@ public void deleteAllAndCommit() throws Exception {
 assertU(commit("softCommit", "false"));
   }
 
+  @Test
+  public void testUpdateBadRequest() throws Exception {
+final long version1 = addAndGetVersion(sdoc("id", "1", "title_s", "first", 
"inplace_updatable_float", 41), null);
+assertU(commit());
+
+// invalid value with set operation
+SolrException e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("set", "NOT_NUMBER")));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("For input string: 
\"NOT_NUMBER\""));
+
+// invalid value with inc operation
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("inc", "NOT_NUMBER")));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("For input string: 
\"NOT_NUMBER\""));
+
+// inc op with null value
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("inc", null)));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("Invalid input 
'null' for field inplace_updatable_float"));
+
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float",

Review comment:
   This surprises me a little bit that we can't increment a float by an 
integer amount?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-08 Thread GitBox



madrob commented on a change in pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#discussion_r538812418



##
File path: solr/core/src/java/org/apache/solr/search/FunctionQParser.java
##
@@ -361,7 +361,9 @@ protected ValueSource parseValueSource(int flags) throws 
SyntaxError {
 ((FunctionQParser)subParser).setParseMultipleSources(true);
   }
   Query subQuery = subParser.getQuery();
-  if (subQuery instanceof FunctionQuery) {
+  if (subQuery == null) {
+valueSource = new DoubleConstValueSource(0.0f);
+  } else if (subQuery instanceof FunctionQuery) {
 valueSource = ((FunctionQuery) subQuery).getValueSource();
   } else {
 valueSource = new QueryValueSource(subQuery, 0.0f);

Review comment:
   Should we add a test in QueryValueSource constructor to require non-null?

##
File path: solr/core/src/java/org/apache/solr/search/FunctionQParser.java
##
@@ -361,7 +361,9 @@ protected ValueSource parseValueSource(int flags) throws 
SyntaxError {
 ((FunctionQParser)subParser).setParseMultipleSources(true);
   }
   Query subQuery = subParser.getQuery();
-  if (subQuery instanceof FunctionQuery) {
+  if (subQuery == null) {
+valueSource = new DoubleConstValueSource(0.0f);

Review comment:
   Why a Double?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-08 Thread GitBox



madrob commented on pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118#issuecomment-741045328


   Overall the fix is definitely good, and I think it's correct, just a few 
minor questions about it for completeness. Thank you for opening the PR!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246157#comment-17246157
 ] 

Joel Bernstein commented on SOLR-15036:
---

I can comment on the drill vs facet. Facet will always be faster than drill 
except in the high cardinality use case. Drill really shines in the high 
cardinality use case though. Rather than sending all tuples to the aggregator 
node, drill can first aggregate inside of the export handler and compress the 
result significantly before hitting the network. And drill never runs out of 
memory.

More work is coming that improves the export handler performance by about 300%. 
But even this improvement doesn't allow drill to match the speed of facet on 
low cardinality aggregations.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
> While this plist approach is doable, it’s a pain for users to have to create 
> the

[GitHub] [lucene-solr] madrob commented on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



madrob commented on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-741047880


   You need to either return a value from the benchmark methods or call 
blackhole.consume, otherwise the JVM will detect that everything is unused 
outside of the scope and optimize it away. That should get you some different 
results. Thank you for being thorough!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246157#comment-17246157
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/8/20, 9:18 PM:
-

I can comment on the drill vs facet question. Facet will always be faster than 
drill except in the high cardinality use case. Drill really shines in the high 
cardinality use case though. Rather than sending all tuples to the aggregator 
node, drill can first aggregate inside of the export handler and compress the 
result significantly before hitting the network. And drill never runs out of 
memory.

More work is coming that improves the export handler performance by about 300%. 
But even this improvement doesn't allow drill to match the speed of facet on 
low cardinality aggregations.


was (Author: joel.bernstein):
I can comment on the drill vs facet. Facet will always be faster than drill 
except in the high cardinality use case. Drill really shines in the high 
cardinality use case though. Rather than sending all tuples to the aggregator 
node, drill can first aggregate inside of the export handler and compress the 
result significantly before hitting the network. And drill never runs out of 
memory.

More work is coming that improves the export handler performance by about 300%. 
But even this improvement doesn't allow drill to match the speed of facet on 
low cardinality aggregations.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> ove

[jira] [Commented] (SOLR-14688) First party package implementation design

2020-12-08 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246171#comment-17246171
 ] 

Noble Paul commented on SOLR-14688:
---

Yes David, that's a missing piece. When there are multiple versions of a 
package available, Solr should pick up the compatible version

eg: package v1 is compatible with solr 8.5 to 8.8 and package v2 is compatible 
with solr 8.9 to solr 9.5. if a node is started with solr 8.8, it should use v1 
and if a node is started with solr 9, it should pick package v2

> First party package implementation design
> -
>
> Key: SOLR-14688
> URL: https://issues.apache.org/jira/browse/SOLR-14688
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Major
>  Labels: package, packagemanager
>
> Here's the design document for first party packages:
> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
> Put differently, this is about package-ifying our "contribs".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246172#comment-17246172
 ] 

Michael Gibney commented on SOLR-15036:
---

[~thelabdude], you mention "JSON facet implementation", which I gather 
(according to the refguide) is under the hood of the [facet streaming 
expression|https://lucene.apache.org/solr/guide/8_7/stream-source-reference.html#facet].
 [~jbernste], you imply that "facet" sends "all tuples to the aggregator node". 
I'm confused here, because that implication contradicts my understanding of 
what the "JSON facet" implementation does (i.e., shard-level aggregation first, 
merging on coordinator node, optional shard-level refinement). Perhaps I'm 
missing something about the specific way in which the {{facet}} streaming 
expression wraps "JSON facet" functionality?

Also, when you say "high cardinality use case", roughly how high is "high", and 
are you referring to high cardinality wrt DocSet domain size, or number of 
unique values in a field?

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each fa

[GitHub] [lucene-solr] tflobbe commented on a change in pull request #2120: SOLR-15029 More gracefully give up shard leadership

2020-12-08 Thread GitBox



tflobbe commented on a change in pull request #2120:
URL: https://github.com/apache/lucene-solr/pull/2120#discussion_r538843558



##
File path: 
solr/core/src/java/org/apache/solr/handler/admin/CollectionsHandler.java
##
@@ -1306,7 +1306,7 @@ private static void forceLeaderElection(SolrQueryRequest 
req, CollectionsHandler
 try (ZkShardTerms zkShardTerms = new ZkShardTerms(collectionName, 
slice.getName(), zkController.getZkClient())) {
   // if an active replica is the leader, then all is fine already
   Replica leader = slice.getLeader();
-  if (leader != null && leader.getState() == State.ACTIVE) {
+  if (leader != null && leader.getState() == State.ACTIVE && 
zkShardTerms.getHighestTerm() == zkShardTerms.getTerm(leader.getName())) {

Review comment:
   I know this is not new code, but should we change `leader.getState() == 
State.ACTIVE` to `leader.isActive(liveNodes)`?

##
File path: solr/core/src/java/org/apache/solr/util/TestInjection.java
##
@@ -337,6 +342,39 @@ public static boolean injectFailUpdateRequests() {
 
 return true;
   }
+
+  public static boolean injectLeaderTragedy(SolrCore core) {

Review comment:
   What's the point of the return value?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246198#comment-17246198
 ] 

Joel Bernstein commented on SOLR-15036:
---

In the high cardinality use case, faceting will eventually run into performance 
and memory problems, so I don't really consider it a great high cardinality 
solution. Not because it sends all tuples to aggregator nodes, but because it's 
an in-memory aggregation.

I was comparing Streaming Expressions, prior to drill, when I mentioned sending 
all tuples to the aggregator nodes.

Streaming Expressions, prior to drill, could use the export handler to send all 
sorted tuples to the aggregator node and accomplish high cardinality 
aggregation.

So, drill improves on previous implementations of Streaming Expressions by 
first aggregating inside the export handler.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the_sum, avg(the_avg) as the_avg, min(the_min) as 
> the_min, max(the_max) as the_max, sum(cnt) as cnt
> )
> {code}
> One thing to point out is that you can’t just avg. the averages back from 
> each collection in the rollup. It needs to be a *weighted avg.* when rolling 
> up the avg. from each facet expression in the plist. However, we have the 
> count per collection, so this is doable but will require some changes to the 
> rollup expression to support weighted average.
>

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246157#comment-17246157
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/9/20, 12:21 AM:
--

I can comment on the drill vs facet question. Facet will always be faster than 
drill except in the high cardinality use case. Drill really shines in the high 
cardinality use case though. Rather than sending all tuples to the aggregator 
node, and using the rollup Stream, drill can first aggregate inside of the 
export handler and compress the result significantly before hitting the 
network. And drill never runs out of memory, where faceting will eventually run 
out of memory.

More work is coming that improves the export handler performance by about 300%. 
But even this improvement doesn't allow drill to match the speed of facet on 
low cardinality aggregations.


was (Author: joel.bernstein):
I can comment on the drill vs facet question. Facet will always be faster than 
drill except in the high cardinality use case. Drill really shines in the high 
cardinality use case though. Rather than sending all tuples to the aggregator 
node, drill can first aggregate inside of the export handler and compress the 
result significantly before hitting the network. And drill never runs out of 
memory.

More work is coming that improves the export handler performance by about 300%. 
But even this improvement doesn't allow drill to match the speed of facet on 
low cardinality aggregations.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_mi

[jira] [Comment Edited] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Joel Bernstein (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246198#comment-17246198
 ] 

Joel Bernstein edited comment on SOLR-15036 at 12/9/20, 12:22 AM:
--

In the high cardinality use case, faceting will eventually run into performance 
and memory problems, so I don't really consider it a great high cardinality 
solution. Not because it sends all tuples to aggregator nodes, but because it's 
an in-memory aggregation.

I was comparing Streaming Expressions, prior to drill, when I mentioned sending 
all tuples to the aggregator nodes.

Streaming Expressions, prior to drill, could use the export handler to send all 
sorted tuples to the aggregator node and accomplish high cardinality 
aggregation.

So, drill improves on previous implementations of Streaming Expressions by 
first aggregating inside the export handler.

Just updated my prior comment to make this more clear.


was (Author: joel.bernstein):
In the high cardinality use case, faceting will eventually run into performance 
and memory problems, so I don't really consider it a great high cardinality 
solution. Not because it sends all tuples to aggregator nodes, but because it's 
an in-memory aggregation.

I was comparing Streaming Expressions, prior to drill, when I mentioned sending 
all tuples to the aggregator nodes.

Streaming Expressions, prior to drill, could use the export handler to send all 
sorted tuples to the aggregator node and accomplish high cardinality 
aggregation.

So, drill improves on previous implementations of Streaming Expressions by 
first aggregating inside the export handler.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit

[jira] [Commented] (SOLR-7964) suggest.highlight=true does not work when using context filter query

2020-12-08 Thread Graham Sutton (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-7964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246219#comment-17246219
 ] 

Graham Sutton commented on SOLR-7964:
-

Any progress on getting this incorporated into one of the upcoming official 
releases? I am still encountering this issue in 8.5.2.

> suggest.highlight=true does not work when using context filter query
> 
>
> Key: SOLR-7964
> URL: https://issues.apache.org/jira/browse/SOLR-7964
> Project: Solr
>  Issue Type: Improvement
>  Components: Suggester
>Affects Versions: 5.4
>Reporter: Arcadius Ahouansou
>Assignee: David Smiley
>Priority: Minor
>  Labels: suggester
> Attachments: SOLR-7964.patch, SOLR_7964.patch, SOLR_7964.patch
>
>
> When using the new suggester context filtering query param 
> {{suggest.contextFilterQuery}} introduced in SOLR-7888, the param 
> {{suggest.highlight=true}} has no effect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] thelabdude opened a new pull request #2132: SOLR-15036: auto- select / rollup / sort / plist over facet expression when using a collection alias with multiple collections

2020-12-08 Thread GitBox



thelabdude opened a new pull request #2132:
URL: https://github.com/apache/lucene-solr/pull/2132


   # Description
   
   Quick impl to show the concept discussed in the JIRA, more tests required 
... Pretty non-invasive to the existing codebase in my opinion thus far ;-)
   
   Also want to try to generalize some of this auto-plist stuff for use with 
different stream sources.
   
   # Solution
   
   Please provide a short description of the approach taken to implement your 
solution.
   
   # Tests
   
   Please describe the tests you've developed or run to confirm this patch 
implements the feature or solves the problem.
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15036) Use plist automatically for executing a facet expression against a collection alias backed by multiple collections

2020-12-08 Thread Michael Gibney (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246228#comment-17246228
 ] 

Michael Gibney commented on SOLR-15036:
---

Thanks for the clarification, [~jbernste]. Would you be able to give a rough 
sense of how high you consider to be high cardinality, and whether you're 
talking about high cardinality _domain_ (DocSet size) or _field_ (number of 
unique values)?

Apologies (and I hope/trust this isn't off-topic for this issue), but "faceting 
will eventually run into performance and memory problems ... because it's an 
in-memory aggregation" -- in a sense all aggregation is an in-memory 
aggregation, it's just a question of how aggressively the accumulation data 
structure is pruned (unless {{drill}} is writing to disk?). I'm honestly having 
a hard time wrapping my head around cases in which {{drill}} would perform 
better than "JSON facet", esp. considering the fundamental distinction that an 
exportWriter-based impl would work with BytesRefs (right?), whereas "JSON 
facet" generally works against term ords (at the shard level). Hence my 
questions about "how high is high" wrt cardinality, etc. ... hoping that will 
help me better understand the performance characteristics you're describing.

> Use plist automatically for executing a facet expression against a collection 
> alias backed by multiple collections
> --
>
> Key: SOLR-15036
> URL: https://issues.apache.org/jira/browse/SOLR-15036
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: streaming expressions
>Reporter: Timothy Potter
>Assignee: Timothy Potter
>Priority: Major
> Attachments: relay-approach.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For analytics use cases, streaming expressions make it possible to compute 
> basic aggregations (count, min, max, sum, and avg) over massive data sets. 
> Moreover, with massive data sets, it is common to use collection aliases over 
> many underlying collections, for instance time-partitioned aliases backed by 
> a set of collections, each covering a specific time range. In some cases, we 
> can end up with many collections (think 50-60) each with 100's of shards. 
> Aliases help insulate client applications from complex collection topologies 
> on the server side.
> Let's take a basic facet expression that computes some useful aggregation 
> metrics:
> {code:java}
> facet(
>   some_alias, 
>   q="*:*", 
>   fl="a_i", 
>   sort="a_i asc", 
>   buckets="a_i", 
>   bucketSorts="count(*) asc", 
>   bucketSizeLimit=1, 
>   sum(a_d), avg(a_d), min(a_d), max(a_d), count(*)
> )
> {code}
> Behind the scenes, the {{FacetStream}} sends a JSON facet request to Solr 
> which then expands the alias to a list of collections. For each collection, 
> the top-level distributed query controller gathers a candidate set of 
> replicas to query and then scatters {{distrib=false}} queries to each replica 
> in the list. For instance, if we have 60 collections with 200 shards each, 
> then this results in 12,000 shard requests from the query controller node to 
> the other nodes in the cluster. The requests are sent in an async manner (see 
> {{SearchHandler}} and {{HttpShardHandler}}) In my testing, we’ve seen cases 
> where we hit 18,000 replicas and these queries don’t always come back in a 
> timely manner. Put simply, this also puts a lot of load on the top-level 
> query controller node in terms of open connections and new object creation.
> Instead, we can use {{plist}} to send the JSON facet query to each collection 
> in the alias in parallel, which reduces the overhead of each top-level 
> distributed query from 12,000 to 200 in my example above. With this approach, 
> you’ll then need to sort the tuples back from each collection and do a 
> rollup, something like:
> {code:java}
> select(
>   rollup(
> sort(
>   plist(
> select(facet(coll1,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt),
> select(facet(coll2,q="*:*", fl="a_i", sort="a_i asc", buckets="a_i", 
> bucketSorts="count(*) asc", bucketSizeLimit=1, sum(a_d), avg(a_d), 
> min(a_d), max(a_d), count(*)),a_i,sum(a_d) as the_sum, avg(a_d) as the_avg, 
> min(a_d) as the_min, max(a_d) as the_max, count(*) as cnt)
>   ),
>   by="a_i asc"
> ),
> over="a_i",
> sum(the_sum), avg(the_avg), min(the_min), max(the_max), sum(cnt)
>   ),
>   a_i, sum(the_sum) as the

[jira] [Commented] (SOLR-14848) Demonstrate how Solr 8, master, or any version previous Solr version before pales next to the reference branch.

2020-12-08 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246278#comment-17246278
 ] 

Mark Robert Miller commented on SOLR-14848:
---

Whew, okay, this issue is finally queuing up. The Solr ref branch phase 1 will 
be called “complete” on Friday. Like any milestone, that is really to my 
definition, but milestones are milestones and it’s an important one for me. 
This issue is prime phase 2, alongside some Nightly and merge up work that can 
move along in parallel. 

> Demonstrate how Solr 8, master, or any version previous Solr version before 
> pales next to the reference branch.
> ---
>
> Key: SOLR-14848
> URL: https://issues.apache.org/jira/browse/SOLR-14848
> Project: Solr
>  Issue Type: Sub-task
>Reporter: Mark Robert Miller
>Priority: Major
>
> I've got a lot of code here and I have and will be claiming that it's an 
> order of magnitude better than what has come before.
> I've been too busy and will be busy for a bit, so I have not been too 
> concerned about backing that up really at all. Most people have no clue what 
> I have here, some people have an inkling, some people are just totally 
> confused, some people think I  maybe have some fast tests, or a slightly more 
> stable system, or maybe some neato performance changes, or even maybe some 
> poorly coded speed hacks. Maybe one or two has a more hope filled guess.
> Almost everyone will think, "all that new code, mostly done by a single 
> person? I know a lot of smart and smarter devs, who cares what this guy is up 
> to. Why would I leave the safety of the branch I know and feel safe with? By 
> definition, the existing stuff is the battle hardened, tried and true leader, 
> and how are you going to come in here without disrupting our comfortable 
> thing?"
> Well, fair enough. I won't try to come and disrupt anything. Instead, there 
> will be benchmarks, stress tests, chaos monkeys, long term endurance tests, 
> and all sorts of fun competitions. Spy vs Spy. I mean Solr vs Solr.
> And while this vanilla version of my previous work has avoided a lot of great 
> changes and improvements I can make (a "remastered" Solr sensible, initial 
> mandate that puts a hand or two behind my back) ...
> ... The reference branch will trounce previous versions of Solr in benchmark 
> after benchmark. It will keep pumping through endurance tests and performance 
> challenges at impressive speed while Solr proper will struggle to finish in a 
> reasonable time or almost certainly, often enough, simply fail to complete 
> the task. The reference branch will devour available resources and fly 
> through work. Solr master will struggle and meander, sometimes in the wrong 
> direction, while leaving the hardware with gobs of idle cpu to chill with 
> (unless it's using most of the cpu for garbage collection at some points).
> This is not meant to brag or dis previous versions of Solr. I was heavily 
> involved in building them. This is the result of dedication and time more 
> than any of my brilliance - the above is simply meant to  state the path that 
> I see coming. As this comparison information and other experiences and 
> stories start to emerge, that master branch won't look nearly so safe or 
> comfortable anymore. And it's at that point that we will find out if anyone 
> is interested in testing our tolerance for disruption by trying to figure out 
> how to get master into the reference branch as opposed to the other way 
> around.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-12-08 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246280#comment-17246280
 ] 

Mark Robert Miller commented on SOLR-14788:
---

The flywheel defender is in rare form. Next week we will start to see what this 
hack code can do more concretely.

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14788) Solr: The Next Big Thing

2020-12-08 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246281#comment-17246281
 ] 

Mark Robert Miller commented on SOLR-14788:
---

[~markus17] I was planning on shortcutting for a variety of reasons a couple 
months back, but given the opportunity stack things in my favor more completely 
before gambling caveats and split priorities, known feedback, I had to take it. 
If you have the opportunity to take a look again, sometime after this week is 
going to be the good entry point.

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14788) Solr: The Next Big Thing

2020-12-08 Thread Mark Robert Miller (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246281#comment-17246281
 ] 

Mark Robert Miller edited comment on SOLR-14788 at 12/9/20, 4:28 AM:
-

[~markus17] I was planning on shortcutting for a variety of reasons a couple 
months back, but given the opportunity to stack things in my favor more 
completely before gambling caveats and split priorities, known feedback, etc, I 
had to take it. If you have the opportunity to take a look again, sometime 
after this week is going to be the good entry point.


was (Author: markrmiller):
[~markus17] I was planning on shortcutting for a variety of reasons a couple 
months back, but given the opportunity stack things in my favor more completely 
before gambling caveats and split priorities, known feedback, I had to take it. 
If you have the opportunity to take a look again, sometime after this week is 
going to be the good entry point.

> Solr: The Next Big Thing
> 
>
> Key: SOLR-14788
> URL: https://issues.apache.org/jira/browse/SOLR-14788
> Project: Solr
>  Issue Type: Task
>Reporter: Mark Robert Miller
>Assignee: Mark Robert Miller
>Priority: Critical
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> h3. 
> [!https://www.unicode.org/consortium/aacimg/1F46E.png!|https://www.unicode.org/consortium/adopted-characters.html#b1F46E]{color:#00875a}*The
>  Policeman is on duty!*{color}
> {quote}_{color:#de350b}*When The Policeman is on duty, sit back, relax, and 
> have some fun. Try to make some progress. Don't stress too much about the 
> impact of your changes or maintaining stability and performance and 
> correctness so much. Until the end of phase 1, I've got your back. I have a 
> variety of tools and contraptions I have been building over the years and I 
> will continue training them on this branch. I will review your changes and 
> peer out across the land and course correct where needed. As Mike D will be 
> thinking, "Sounds like a bottleneck Mark." And indeed it will be to some 
> extent. Which is why once stage one is completed, I will flip The Policeman 
> to off duty. When off duty, I'm always* {color:#de350b}*occasionally*{color} 
> *down for some vigilante justice, but I won't be walking the beat, all that 
> stuff about sit back and relax goes out the window.*{color}_
> {quote}
>  
> I have stolen this title from Ishan or Noble and Ishan.
> This issue is meant to capture the work of a small team that is forming to 
> push Solr and SolrCloud to the next phase.
> I have kicked off the work with an effort to create a very fast and solid 
> base. That work is not 100% done, but it's ready to join the fight.
> Tim Potter has started giving me a tremendous hand in finishing up. Ishan and 
> Noble have already contributed support and testing and have plans for 
> additional work to shore up some of our current shortcomings.
> Others have expressed an interest in helping and hopefully they will pop up 
> here as well.
> Let's organize and discuss our efforts here and in various sub issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gf2121 commented on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 commented on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-741520801


   > You need to either return a value from the benchmark methods or call 
blackhole.consume, otherwise the JVM will detect that everything is unused 
outside of the scope and optimize it away. That should get you some different 
results. Thank you for being thorough!
   
   Thank you for the clue! Based on your guidance, I tried some more benchmark, 
but find array val is alway faster... here are the codes and results (code is 
used to shows the way that i tried to prevent jvm optimize, so only one method 
is enough).
   
   1. return an array result
   ```
   public long[] decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR;
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92215691.271 ± 1149229.830
   MyBenchmark.decode1 | 62019521.428 ± 4268837.164
   MyBenchmark.decode2 | 62595196.347 ± 1434012.058
   
   2. return an long result
   ```
   public long decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR[31];
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92470935.234 ± 3525240.576
   MyBenchmark.decode1 | 62389057.277 ±  567747.489
   MyBenchmark.decode2 | 62141559.925 ± 1012364.417
   
   3. blackwhole consume last
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   blackhole.consume(ARR[30]);
   blackhole.consume(ARR[31]);
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 79570016.826 ± 1210338.335
   MyBenchmark.decode1 | 58225242.201 ±  905039.184
   MyBenchmark.decode2 | 58524381.688 ±  585220.494
   
   4. blackwhole consume in loop
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-741520801


   > You need to either return a value from the benchmark methods or call 
blackhole.consume, otherwise the JVM will detect that everything is unused 
outside of the scope and optimize it away. That should get you some different 
results. Thank you for being thorough!
   
   Thank you for the clue! Based on your guidance, I tried some more benchmark, 
but find array val is alway faster... here are the codes and results (code is 
used to show the way prevent jvm optimize, so only one method is enough here).
   
   1. return an array result
   ```
   public long[] decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR;
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92215691.271 ± 1149229.830
   MyBenchmark.decode1 | 62019521.428 ± 4268837.164
   MyBenchmark.decode2 | 62595196.347 ± 1434012.058
   
   2. return an long result
   ```
   public long decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR[31];
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92470935.234 ± 3525240.576
   MyBenchmark.decode1 | 62389057.277 ±  567747.489
   MyBenchmark.decode2 | 62141559.925 ± 1012364.417
   
   3. blackwhole consume last
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   blackhole.consume(ARR[30]);
   blackhole.consume(ARR[31]);
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 79570016.826 ± 1210338.335
   MyBenchmark.decode1 | 58225242.201 ±  905039.184
   MyBenchmark.decode2 | 58524381.688 ±  585220.494
   
   4. blackwhole consume in loop
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {

[GitHub] [lucene-solr] gf2121 edited a comment on pull request #2113: LUCENE-9629: Use computed masks

2020-12-08 Thread GitBox



gf2121 edited a comment on pull request #2113:
URL: https://github.com/apache/lucene-solr/pull/2113#issuecomment-741520801


   > You need to either return a value from the benchmark methods or call 
blackhole.consume, otherwise the JVM will detect that everything is unused 
outside of the scope and optimize it away. That should get you some different 
results. Thank you for being thorough!
   
   Thank you for the clue! Based on your guidance, I tried some more benchmark, 
but find array val is alway faster... here are the codes and results (code is 
used to show the way prevent jvm optimize, so only one method is pasted here).
   
   1. return an array result
   ```
   public long[] decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR;
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92215691.271 ± 1149229.830
   MyBenchmark.decode1 | 62019521.428 ± 4268837.164
   MyBenchmark.decode2 | 62595196.347 ± 1434012.058
   
   2. return an long result
   ```
   public long decode0() {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   return ARR[31];
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 92470935.234 ± 3525240.576
   MyBenchmark.decode1 | 62389057.277 ±  567747.489
   MyBenchmark.decode2 | 62141559.925 ± 1012364.417
   
   3. blackwhole consume last
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {
   long l0 = (TMP[tmpIdx+0] & MASKS16_1[0]) << 14;
   l0 |= (TMP[tmpIdx+1] & MASKS16_1[0]) << 13;
   l0 |= (TMP[tmpIdx+2] & MASKS16_1[0]) << 12;
   l0 |= (TMP[tmpIdx+3] & MASKS16_1[0]) << 11;
   l0 |= (TMP[tmpIdx+4] & MASKS16_1[0]) << 10;
   l0 |= (TMP[tmpIdx+5] & MASKS16_1[0]) << 9;
   l0 |= (TMP[tmpIdx+6] & MASKS16_1[0]) << 8;
   l0 |= (TMP[tmpIdx+7] & MASKS16_1[0]) << 7;
   l0 |= (TMP[tmpIdx+8] & MASKS16_1[0]) << 6;
   l0 |= (TMP[tmpIdx+9] & MASKS16_1[0]) << 5;
   l0 |= (TMP[tmpIdx+10] & MASKS16_1[0]) << 4;
   l0 |= (TMP[tmpIdx+11] & MASKS16_1[0]) << 3;
   l0 |= (TMP[tmpIdx+12] & MASKS16_1[0]) << 2;
   l0 |= (TMP[tmpIdx+13] & MASKS16_1[0]) << 1;
   l0 |= (TMP[tmpIdx+14] & MASKS16_1[0]) << 0;
   ARR[longsIdx+0] = l0;
   }
   blackhole.consume(ARR[30]);
   blackhole.consume(ARR[31]);
   }
   ```
   method | speed (ops/s)
    | -
   MyBenchmark.decode0 | 79570016.826 ± 1210338.335
   MyBenchmark.decode1 | 58225242.201 ±  905039.184
   MyBenchmark.decode2 | 58524381.688 ±  585220.494
   
   4. blackwhole consume in loop
   ```
   public void decode0(Blackhole blackhole) {
   for (int iter = 0, tmpIdx = 0, longsIdx = 30; iter < 2; ++iter, 
tmpIdx += 15, longsIdx += 1) {

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r539024109



##
File path: 
solr/core/src/test/org/apache/solr/update/TestInPlaceUpdatesStandalone.java
##
@@ -121,6 +123,36 @@ public void deleteAllAndCommit() throws Exception {
 assertU(commit("softCommit", "false"));
   }
 
+  @Test
+  public void testUpdateBadRequest() throws Exception {
+final long version1 = addAndGetVersion(sdoc("id", "1", "title_s", "first", 
"inplace_updatable_float", 41), null);
+assertU(commit());
+
+// invalid value with set operation
+SolrException e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("set", "NOT_NUMBER")));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("For input string: 
\"NOT_NUMBER\""));
+
+// invalid value with inc operation
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("inc", "NOT_NUMBER")));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("For input string: 
\"NOT_NUMBER\""));
+
+// inc op with null value
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float", map("inc", null)));
+assertEquals(SolrException.ErrorCode.BAD_REQUEST.code, e.code());
+MatcherAssert.assertThat(e.getMessage(), containsString("Invalid input 
'null' for field inplace_updatable_float"));
+
+e = expectThrows(SolrException.class,
+() -> addAndAssertVersion(version1, "id", "1", 
"inplace_updatable_float",

Review comment:
   We can increment float by an integer. In this particular test input, 
verifying the case when instead of passing the number, a list of numbers is 
passed. Previously, Solr used to return 500 with the current changes Bad 
request would be returned
   `"Invalid input '[123]' for field inplace_updatable_float"`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r539029127



##
File path: 
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java
##
@@ -143,6 +147,15 @@ public SolrInputDocument merge(final SolrInputDocument 
fromDoc, SolrInputDocumen
 return toDoc;
   }
 
+  private static String getID(SolrInputDocument doc, IndexSchema schema) {
+String id = "";

Review comment:
   I'm thinking to rephrase the above error message to something like so 
that it is better than the previous msg. If the id is not known then, I think 
maybe it is better not to send anything related id, wdyt?
   ```
   "Error:" + getID(toDoc, schema) + "Unknown operation for the an atomic 
update : " + key;
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] munendrasn commented on a change in pull request #2121: SOLR-10860: Return proper error code for bad input incase of inplace updates

2020-12-08 Thread GitBox



munendrasn commented on a change in pull request #2121:
URL: https://github.com/apache/lucene-solr/pull/2121#discussion_r539029723



##
File path: 
solr/core/src/java/org/apache/solr/update/processor/AtomicUpdateDocumentMerger.java
##
@@ -553,7 +574,15 @@ private Object getNativeFieldValue(String fieldName, 
Object val) {
   return val;
 }
 SchemaField sf = schema.getField(fieldName);
-return sf.getType().toNativeType(val);
+try {
+  return sf.getType().toNativeType(val);
+} catch (SolrException ex) {
+  throw new SolrException(SolrException.ErrorCode.getErrorCode(ex.code()),
+  "Error converting field '" + sf.getName() + "'='" +val+"' to native 
type, msg=" + ex.getMessage(), ex);

Review comment:
   cause gets lost in the metadata section of the response so, thought this 
would give simpler insight into error. Also, trying to follow the same 
convention as other error messages in DocumentBuilder





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-10732) potential optimizations in callers of SolrIndexSearcher.numDocs when docset is empty

2020-12-08 Thread Munendra S N (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-10732?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246305#comment-17246305
 ] 

Munendra S N commented on SOLR-10732:
-

{quote}I'm curious, Munendra S N – were you able to perceive a performance 
benefit with these changes? Where these optimizations are located, afaict they 
optimize edge cases, and the query-building they prevent (if I'm reading right) 
is generally pretty lightweight (e.g., TermQuery ...).{quote}
Changes here are based on this 
[comment|https://issues.apache.org/jira/browse/SOLR-10727?focusedCommentId=16020247&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16020247].
 As you said, this tries to avoid additional object creation and computation 
for some edge cases and based on my understanding, it helps especially in case 
facet queries or group facets

{quote}By way of contrast (wrt complexity/benefit tradeoff), at the leaf level 
it looks like SolrIndexSearcher.getDocSet(Query, DocSet) could be optimized in 
a way analogous to what SOLR-10727 does for SolrIndexSearcher.numDocs(Query, 
DocSet), avoiding filterCache pollution {quote}
+1, If there is possibility to improve/optimize it. We should definitely do it 
but I think it should be handled in its own issue

{quote}or maybe also higher up in the program logic, to prune as much execution 
as possible (and when it's clearer how/why we got the point of having an empty 
domain). The changes here seem to be building in mid-level "shot in the dark" 
safeguards, where it's relatively unclear what's going on.{quote}
Initially, planned to make these changes in getFacetCounts which would handle 
the case for intervalFacet and heatmap but realized changes would be too 
cluttered so, decided to delegate handling this case respective types. Let me 
know if this could be simplified and probably handle other facets too

> potential optimizations in callers of SolrIndexSearcher.numDocs when docset 
> is empty
> 
>
> Key: SOLR-10732
> URL: https://issues.apache.org/jira/browse/SOLR-10732
> Project: Solr
>  Issue Type: Improvement
>Reporter: Chris M. Hostetter
>Priority: Major
> Attachments: SOLR-10732.patch
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> spin off of SOLR-10727...
> {quote}
> ...why not (also) optimize it slightly higher up and completely avoid the 
> construction of the Query objects? (and in some cases: additional overhead)
> for example: the first usage of {{SolrIndexSearcher.numDocs(Query,DocSet)}} i 
> found was {{RangeFacetProcessor.rangeCount(DocSet subset,...)}} ... if the 
> first line of that method was {{if (0 == subset.size()) return 0}} then we'd 
> not only optimize away the SolrIndexSearcher hit, but also fetching the 
> SchemaField & building the range query (not to mention the much more 
> expensive {{getGroupedFacetQueryCount}} in the grouping case)
> At a glance, most other callers of 
> {{SolrIndexSearcher.numDocs(Query,DocSet)}} could be trivially optimize this 
> way as well -- at a minimum to eliminate Query parsing/construction.
> {quote}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14688) First party package implementation design

2020-12-08 Thread Tomas Eduardo Fernandez Lobbe (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17246314#comment-17246314
 ] 

Tomas Eduardo Fernandez Lobbe commented on SOLR-14688:
--

There were some discussions in Slack the last couple days that I'd like to 
bring here since they are related to this Jira issue. The threads are 
[this|https://the-asf.slack.com/archives/CNMTSU970/p1582794103004800] and 
[this|https://the-asf.slack.com/archives/CNMTSU970/p1607019429038000]. While 
there are different opinions expressed in those threads, the general sentiment 
is that there are missing pieces to the stories of packages/plugins, that need 
to be resolved before we proceed here. In particular, compatibility and offline 
installs. I'm now going to bring a summary of my comments and concerns here.

In the current state of things with package manager and if this issue is 
implemented, someone could technically install a package from a different 
version, but this can only be achieved by ensuring binary compatibility between 
core and the packages across those different versions. This is way more than 
what we guarantee today (think of something in analysis-extras calling 
{{coreContainer.getCore().getLatestSchema().callSomethingAddedIn8.9()}}, would 
break in any core version previous to 8.9). Guaranteeing this binary 
compatibility would require a ton of testing (every package version against 
every core version that's supported), it would put a lot of burden on us 
developers, making it very difficult to add/change/deprecate/retire code and 
may even make major upgrades impossible, or if we just take binary 
compatibility as a "best effort", we'll make it difficult on the users to 
figure out which version of what is compatible with other versions.

One question that I raised is, why do we want people to install a newer 
contribs/packages into an older core version? Why don't we instead encourage 
people to upgrade Solr by making it easier to do? Major version upgrades could 
be more problematic because of index compatibility, yes, but really, having 
binary compatibility across major upgrades is going to be very, very hard.

There is also great concern about the inability to install packages offline, 
and how that affects the ability to install/deploy first/third party plugins (a 
bunch of people expressed this in particular in those Slack threads I 
mentioned). I believe the root of the problem is the fact that packages *have* 
to be cluster-wide now. Instead of being able to create the deployable in some 
build infrastructure, away from production environments, and then move that 
deployable across your different environments such as "dev", "qa", "prod",  or 
whatever you have, the current implementation only allows one to configure a 
cluster once it's created and running, doing API calls (forcing to enable 
package manager AFAIK, even if no code needs to be added dynamically later), 
and exposing the production environment to either a package repository or even 
internet.

I believe packages (first, or third party) could work better if they could be 
local to a node (and this doesn't mean there can't be cluster-wide packages, 
but we need at least the "local" option). People could then, for example, 
create their Docker image like (and these are not real commands, just get the 
idea):
{noformat}
FROM official-docker-image-slim:x.y.z

ADD /some/build/path/custom-plugin1 /some/location/in/solr/custom-plugin1

RUN /solr/bin/solr install custom-plugin1 /some/location/in/solr/custom-plugin1 
RUN /solr/bin/solr install analysis-extra 
solr.apache.org/packages/analysis-extra/x.y.z
#or RUN /solr/bin/solr install analysis-extra /first/party/plugins/location
{noformat}
(The example is with Docker, but similar things can be done with other 
deployables, like AMIs in AWS, or I'm sure any container technology.)

And then just build it and deploy it. If you are using the Kubernetes Solr 
operator, it's a single command and the upgrade will start safely and 
automatically. It's also important to mention that any upgrade could look just 
the same, regardless if what you changed was Solr core, first party or third 
party plugin.

I'm +1 on making the code more modular and independent, have better, well 
thought interfaces like the one created for the replica placement framework and 
much of the work ab has been doing to define higher level interfaces of things 
currently need rework, but I think with the current state of package manager, 
with cluster-wide packages, this issue is very dangerous.


> First party package implementation design
> -
>
> Key: SOLR-14688
> URL: https://issues.apache.org/jira/browse/SOLR-14688
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Major
>

59 matches

Mail list logo