date:20201222

[GitHub] [lucene-solr] iverase commented on a change in pull request #2155: LUCENE-9641: Support for spatial relationships in LatLonPoint

2020-12-22 Thread GitBox



iverase commented on a change in pull request #2155:
URL: https://github.com/apache/lucene-solr/pull/2155#discussion_r547165818



##
File path: lucene/core/src/java/org/apache/lucene/document/LatLonPointQuery.java
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.geo.Component2D;
+import org.apache.lucene.geo.GeoEncodingUtils;
+import org.apache.lucene.geo.LatLonGeometry;
+import org.apache.lucene.geo.Line;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.util.NumericUtils;
+
+import java.util.Arrays;
+import java.util.function.Function;
+import java.util.function.Predicate;
+
+import static org.apache.lucene.geo.GeoEncodingUtils.decodeLatitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.decodeLongitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.encodeLatitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.encodeLongitude;
+
+/**
+ * Finds all previously indexed geo points that comply the given {@link 
QueryRelation} with
+ * the specified array of {@link LatLonGeometry}.
+ *
+ * The field must be indexed using one or more {@link LatLonPoint} added 
per document.
+ *
+ **/
+final class LatLonPointQuery extends SpatialQuery {
+  final private LatLonGeometry[] geometries;
+  final private Component2D component2D;
+  
+  /**
+   * Creates a query that matches all indexed shapes to the provided array of 
{@link LatLonGeometry}
+   */
+  LatLonPointQuery(String field, QueryRelation queryRelation, 
LatLonGeometry... geometries) {
+super(field, queryRelation);
+if (queryRelation == QueryRelation.WITHIN) {
+  for (LatLonGeometry geometry : geometries) {
+if (geometry instanceof Line) {
+  // TODO: line queries do not support within relations
+  throw new IllegalArgumentException("LatLonPointQuery does not 
support " + QueryRelation.WITHIN + " queries with line geometries");
+}
+  }
+}
+this.component2D = LatLonGeometry.create(geometries);
+this.geometries = geometries.clone();
+  }
+  
+  @Override
+  protected SpatialVisitor getSpatialVisitor() {
+if (component2D.getMinY() > component2D.getMaxY()) {
+  // encodeLatitudeCeil may cause minY to be > maxY iff
+  // the delta between the longitude < the encoding resolution
+  return EMPTYVISITOR;

Review comment:
   The problem was in the way the component predicate is built. I have 
another thought and I think I push a better solution in 1cebeff





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] iverase commented on a change in pull request #2155: LUCENE-9641: Support for spatial relationships in LatLonPoint

2020-12-22 Thread GitBox



iverase commented on a change in pull request #2155:
URL: https://github.com/apache/lucene-solr/pull/2155#discussion_r547166564



##
File path: lucene/core/src/java/org/apache/lucene/document/LatLonPointQuery.java
##
@@ -0,0 +1,174 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.document;
+
+import org.apache.lucene.document.ShapeField.QueryRelation;
+import org.apache.lucene.geo.Component2D;
+import org.apache.lucene.geo.GeoEncodingUtils;
+import org.apache.lucene.geo.LatLonGeometry;
+import org.apache.lucene.geo.Line;
+import org.apache.lucene.index.PointValues.Relation;
+import org.apache.lucene.util.NumericUtils;
+
+import java.util.Arrays;
+import java.util.function.Function;
+import java.util.function.Predicate;
+
+import static org.apache.lucene.geo.GeoEncodingUtils.decodeLatitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.decodeLongitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.encodeLatitude;
+import static org.apache.lucene.geo.GeoEncodingUtils.encodeLongitude;
+
+/**
+ * Finds all previously indexed geo points that comply the given {@link 
QueryRelation} with
+ * the specified array of {@link LatLonGeometry}.
+ *
+ * The field must be indexed using one or more {@link LatLonPoint} added 
per document.
+ *
+ **/
+final class LatLonPointQuery extends SpatialQuery {
+  final private LatLonGeometry[] geometries;
+  final private Component2D component2D;
+  
+  /**
+   * Creates a query that matches all indexed shapes to the provided array of 
{@link LatLonGeometry}
+   */
+  LatLonPointQuery(String field, QueryRelation queryRelation, 
LatLonGeometry... geometries) {
+super(field, queryRelation);
+if (queryRelation == QueryRelation.WITHIN) {
+  for (LatLonGeometry geometry : geometries) {
+if (geometry instanceof Line) {
+  // TODO: line queries do not support within relations
+  throw new IllegalArgumentException("LatLonPointQuery does not 
support " + QueryRelation.WITHIN + " queries with line geometries");
+}
+  }
+}
+this.component2D = LatLonGeometry.create(geometries);
+this.geometries = geometries.clone();
+  }
+  
+  @Override
+  protected SpatialVisitor getSpatialVisitor() {
+if (component2D.getMinY() > component2D.getMaxY()) {
+  // encodeLatitudeCeil may cause minY to be > maxY iff
+  // the delta between the longitude < the encoding resolution
+  return EMPTYVISITOR;
+}
+final GeoEncodingUtils.Component2DPredicate component2DPredicate = 
GeoEncodingUtils.createComponentPredicate(component2D);
+// bounding box over all geometries, this can speed up tree 
intersection/cheaply improve approximation for complex multi-geometries
+final byte[] minLat = new byte[Integer.BYTES];
+final byte[] maxLat = new byte[Integer.BYTES];
+final byte[] minLon = new byte[Integer.BYTES];
+final byte[] maxLon = new byte[Integer.BYTES];
+NumericUtils.intToSortableBytes(encodeLatitude(component2D.getMinY()), 
minLat, 0);
+NumericUtils.intToSortableBytes(encodeLatitude(component2D.getMaxY()), 
maxLat, 0);
+NumericUtils.intToSortableBytes(encodeLongitude(component2D.getMinX()), 
minLon, 0);
+NumericUtils.intToSortableBytes(encodeLongitude(component2D.getMaxX()), 
maxLon, 0);
+
+return new SpatialVisitor() {
+  @Override
+  protected Relation relate(byte[] minPackedValue, byte[] maxPackedValue) {
+if (Arrays.compareUnsigned(minPackedValue, 0, Integer.BYTES, maxLat, 
0, Integer.BYTES) > 0 ||
+Arrays.compareUnsigned(maxPackedValue, 0, Integer.BYTES, 
minLat, 0, Integer.BYTES) < 0 ||
+Arrays.compareUnsigned(minPackedValue, Integer.BYTES, 
Integer.BYTES + Integer.BYTES, maxLon, 0, Integer.BYTES) > 0 ||
+Arrays.compareUnsigned(maxPackedValue, Integer.BYTES, 
Integer.BYTES + Integer.BYTES, minLon, 0, Integer.BYTES) < 0) {
+  // outside of global bounding box range
+  return Relation.CELL_OUTSIDE_QUERY;
+}
+
+double cellMinLat = decodeLatitude(minPackedValue, 0);
+double cellMinLon = decodeLongitude(minPackedValue, Integer.BYTES);
+double cellMaxLat = decodeLati

[GitHub] [lucene-solr] murblanc commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks

2020-12-22 Thread GitBox



murblanc commented on a change in pull request #2148:
URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547236069



##
File path: 
solr/core/src/java/org/apache/solr/cloud/overseer/CollectionMutator.java
##
@@ -136,8 +154,13 @@ public ZkWriteCommand modifyCollection(final ClusterState 
clusterState, ZkNodePr
   return ZkStateWriter.NO_OP;
 }
 
-return new ZkWriteCommand(coll.getName(),
-new DocCollection(coll.getName(), coll.getSlicesMap(), m, 
coll.getRouter(), coll.getZNodeVersion(), coll.getZNode()));
+DocCollection collection = new DocCollection(coll.getName(), 
coll.getSlicesMap(), m, coll.getRouter(), coll.getZNodeVersion(), 
coll.getZNode());
+if (replicaOps == null){
+  return new ZkWriteCommand(coll.getName(), collection);
+} else {
+  return new ZkWriteCommand(coll.getName(), collection, replicaOps, true);
+}

Review comment:
   My bad, the two are different.
   When `replicaOps` is `null`, `ops` in `ZkWriteCommand` is set to 
`PerReplicaStates.WriteOps.touchChildren()`.
   When `replicaOps` is not `null`, `ops` in `ZkWriteCommand` is set to 
`replicaOps`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] gerlowskija merged pull request #2146: SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins'

2020-12-22 Thread GitBox



gerlowskija merged pull request #2146:
URL: https://github.com/apache/lucene-solr/pull/2146


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253501#comment-17253501
 ] 

ASF subversion and git services commented on SOLR-15049:


Commit 8b272a0960b619664ae9abe4ea2812330f0b2d5d in lucene-solr's branch 
refs/heads/master from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8b272a0 ]

SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins' (#2146)



> Optimize TopLevelJoinQuery "self joins"
> ---
>
> Key: SOLR-15049
> URL: https://issues.apache.org/jira/browse/SOLR-15049
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The "TopLevelJoinQuery" join implementation added recently performs well in a 
> variety of circumstances.  However we can improve the performance further by 
> adding logic to handle the special case where fields being joined are the 
> same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}).
> Currently this self-join usecase still does a lot of work to convert "from" 
> ordinals into "to" ordinals.  If we make the query (or QParserPlugin?) smart 
> enough to recognize this special case we can skip this step and improve 
> performance greatly in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates

2020-12-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253513#comment-17253513
 ] 

David Smiley commented on LUCENE-7344:
--

I know it's been 4 years but I want to add that a Lucene Query has a Weight 
with a relatively new method, {{isCacheable}} that is a reasonable 
approximation for wether Solr DBQ processing could skip the realtimeSearcher 
reopen.  This method must return false if there are any DocValue dependent 
clauses in the Query.  It can also return false in some other cases (e.g. 
BooleanQuery with many clauses) but usually it will not.

> Deletion by query of uncommitted docs not working with DV updates
> -
>
> Key: LUCENE-7344
> URL: https://issues.apache.org/jira/browse/LUCENE-7344
> Project: Lucene - Core
>  Issue Type: Bug
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: LUCENE-7344.patch, LUCENE-7344.patch, LUCENE-7344.patch, 
> LUCENE-7344.patch, LUCENE-7344.patch, LUCENE-7344.patch, 
> LUCENE-7344_hoss_experiment.patch
>
>
> When DVs are updated, delete by query doesn't work with the updated DV value.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253528#comment-17253528
 ] 

ASF subversion and git services commented on SOLR-15049:


Commit 7f8e260c124e94ddd5033ee30249cf56f813729e in lucene-solr's branch 
refs/heads/branch_8x from Jason Gerlowski
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7f8e260 ]

SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins'


> Optimize TopLevelJoinQuery "self joins"
> ---
>
> Key: SOLR-15049
> URL: https://issues.apache.org/jira/browse/SOLR-15049
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The "TopLevelJoinQuery" join implementation added recently performs well in a 
> variety of circumstances.  However we can improve the performance further by 
> adding logic to handle the special case where fields being joined are the 
> same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}).
> Currently this self-join usecase still does a lot of work to convert "from" 
> ordinals into "to" ordinals.  If we make the query (or QParserPlugin?) smart 
> enough to recognize this special case we can skip this step and improve 
> performance greatly in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"

2020-12-22 Thread Jason Gerlowski (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Gerlowski resolved SOLR-15049.

Fix Version/s: master (9.0)
   8.8
   Resolution: Fixed

> Optimize TopLevelJoinQuery "self joins"
> ---
>
> Key: SOLR-15049
> URL: https://issues.apache.org/jira/browse/SOLR-15049
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: master (9.0)
>Reporter: Jason Gerlowski
>Assignee: Jason Gerlowski
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> The "TopLevelJoinQuery" join implementation added recently performs well in a 
> variety of circumstances.  However we can improve the performance further by 
> adding logic to handle the special case where fields being joined are the 
> same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}).
> Currently this self-join usecase still does a lot of work to convert "from" 
> ordinals into "to" ordinals.  If we make the query (or QParserPlugin?) smart 
> enough to recognize this special case we can skip this step and improve 
> performance greatly in this case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] msokolov merged pull request #2157: LUCENE-9644: diversity heuristic for HNSW graph neighbor selection

2020-12-22 Thread GitBox



msokolov merged pull request #2157:
URL: https://github.com/apache/lucene-solr/pull/2157


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (SOLR-15057) avoid unnecessary object retention in FacetRangeProcessor

2020-12-22 Thread Christine Poerschke (Jira)

Christine Poerschke created SOLR-15057:
--

 Summary: avoid unnecessary object retention in FacetRangeProcessor
 Key: SOLR-15057
 URL: https://issues.apache.org/jira/browse/SOLR-15057
 Project: Solr
  Issue Type: Task
Reporter: Christine Poerschke
Assignee: Christine Poerschke


(pull request to follow)
 * The (private) {{doSubs}} method is a no-op if there are no sub-facets.
 * The (private) {{intersections}} and {{filters}} arrays are only used by the 
{{doSubs}} method.
 * The (private) {{rangeStats}} method currently always populates the 
{{intersections}} and {{filters}} arrays, even when nothing actually 
subsequently uses them.
 * If {{rangeStats}} only populated the {{intersections}} array when it's 
actually needed then the {{DocSet intersection}} object would remain local in 
scope and hence the garbage collector could collect it earlier.

[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9644) HNSW diverse neighbor selection heuristic

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253546#comment-17253546
 ] 

ASF subversion and git services commented on LUCENE-9644:
-

Commit e1cd426bce39abc4345b748d9cff5ff7fe10315f in lucene-solr's branch 
refs/heads/master from Michael Sokolov
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e1cd426 ]

LUCENE-9644: diversity heuristic for HNSW graph neighbor selection (#2157)

 * Additional options to KnnGraphTester to support benchmarking with 
ann-benchmarks
 * switch to parallel array-based storage in HnswGraph (was using LongHeap)

> HNSW diverse neighbor selection heuristic
> -
>
> Key: LUCENE-9644
> URL: https://issues.apache.org/jira/browse/LUCENE-9644
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will replace the simple nearest neighbor selection with a criterion that 
> takes into account the distance of the neighbors from each other. It is seen 
> to provide dramatically improved recall on at least two datasets, and is what 
> is being used by our reference implementation, hnswlib. The basic idea is 
> that when selecting  the nearest neighbors to associate with a new node added 
> to the graph, we filter using a diversity criterion. If a candidate neighbor 
> is closer to an already-added (closer to the new node) neighbor than it is to 
> the new node, then we pass over it, moving on to more-distant, but presumably 
> more diverse neighbors. The same criterion is also (re-) applied to the 
> neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.643|0.77|  10| 50| 32| 64| 1742|   22830|
> |0.671|0.95|  10| 100|32| 64| 2141|   0|
> |0.704|1.32|  10| 200|32| 64| 2923|   0|
> |0.739|2.04|  10| 400|32| 64| 4382|   0|
> |0.470|0.91|  100|50| 32| 64| 2068|   337081|
> |0.496|1.21|  100|100|32| 64| 2548|   0|
> |0.533|1.77|  100|200|32| 64| 3479|   0|
> |0.573|2.58|  100|400|32| 64| 5257|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.801|0.57|  10| 50| 32| 64| 593|17985|
> |0.840|0.67|  10| 100|32| 64| 738|0|
> |0.883|0.97|  10| 200|32| 64| 1018|   0|
> |0.921|1.36|  10| 400|32| 64| 1502|   0|
> |0.723|0.71|  100|50| 32| 64| 860|298383|
> |0.761|0.77|  100|100|32| 64| 1058|   0|
> |0.806|1.06|  100|200|32| 64| 1442|   0|
> |0.854|1.67|  100|400|32| 64| 2159|   0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.933|1.41|  10| 50| 32| 64| 1496|   35462|
> |0.948|1.39|  10| 100|32| 64| 1872|   0|
> |0.961|2.10|  10| 200|32| 64| 2591|   0|
> |0.972|3.04|  10| 400|32| 64| 3939|   0|
> |0.827|1.34|  100|50| 32| 64| 1676|   535802|
> |0.854|1.76|  100|100|32| 64| 2056|   0|
> |0.887|2.47|  100|200|32| 64| 2761|   0|
> |0.907|3.75|  100|400|32| 64| 4129|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.966|1.18|  10| 50| 32| 64| 1480|   37656|
> |0.977|1.46|  10| 100|32| 64| 1832|   0|
> |0.988|2.00|  10| 200|32| 64| 2472|   0|
> |0.995|3.14|  10| 400|32| 64| 3629|   0|
> |0.944|1.34|  100|50| 32| 64| 1780|   526834|
> |0.959|1.71|  100|100|32| 64| |   0|
> |0.975|2.30|  100|200|32| 64| 3041|   0|
> |0.986|3.56|  100|400|32| 64| 4543|   0|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-9644) HNSW diverse neighbor selection heuristic

2020-12-22 Thread Michael Sokolov (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Sokolov resolved LUCENE-9644.
-
Resolution: Fixed

> HNSW diverse neighbor selection heuristic
> -
>
> Key: LUCENE-9644
> URL: https://issues.apache.org/jira/browse/LUCENE-9644
> Project: Lucene - Core
>  Issue Type: Improvement
>Reporter: Michael Sokolov
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> This will replace the simple nearest neighbor selection with a criterion that 
> takes into account the distance of the neighbors from each other. It is seen 
> to provide dramatically improved recall on at least two datasets, and is what 
> is being used by our reference implementation, hnswlib. The basic idea is 
> that when selecting  the nearest neighbors to associate with a new node added 
> to the graph, we filter using a diversity criterion. If a candidate neighbor 
> is closer to an already-added (closer to the new node) neighbor than it is to 
> the new node, then we pass over it, moving on to more-distant, but presumably 
> more diverse neighbors. The same criterion is also (re-) applied to the 
> neighbor nodes' neighbors, since we add the links bidirectionally.
> h2. Results:
> h3. GloVe/Wikipedia
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.643|0.77|  10| 50| 32| 64| 1742|   22830|
> |0.671|0.95|  10| 100|32| 64| 2141|   0|
> |0.704|1.32|  10| 200|32| 64| 2923|   0|
> |0.739|2.04|  10| 400|32| 64| 4382|   0|
> |0.470|0.91|  100|50| 32| 64| 2068|   337081|
> |0.496|1.21|  100|100|32| 64| 2548|   0|
> |0.533|1.77|  100|200|32| 64| 3479|   0|
> |0.573|2.58|  100|400|32| 64| 5257|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.801|0.57|  10| 50| 32| 64| 593|17985|
> |0.840|0.67|  10| 100|32| 64| 738|0|
> |0.883|0.97|  10| 200|32| 64| 1018|   0|
> |0.921|1.36|  10| 400|32| 64| 1502|   0|
> |0.723|0.71|  100|50| 32| 64| 860|298383|
> |0.761|0.77|  100|100|32| 64| 1058|   0|
> |0.806|1.06|  100|200|32| 64| 1442|   0|
> |0.854|1.67|  100|400|32| 64| 2159|   0|
> h3. Dataset from work:
> h4. baseline
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.933|1.41|  10| 50| 32| 64| 1496|   35462|
> |0.948|1.39|  10| 100|32| 64| 1872|   0|
> |0.961|2.10|  10| 200|32| 64| 2591|   0|
> |0.972|3.04|  10| 400|32| 64| 3939|   0|
> |0.827|1.34|  100|50| 32| 64| 1676|   535802|
> |0.854|1.76|  100|100|32| 64| 2056|   0|
> |0.887|2.47|  100|200|32| 64| 2761|   0|
> |0.907|3.75|  100|400|32| 64| 4129|   0|
> h4. diverse
> ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited||  index 
> ms||
> |0.966|1.18|  10| 50| 32| 64| 1480|   37656|
> |0.977|1.46|  10| 100|32| 64| 1832|   0|
> |0.988|2.00|  10| 200|32| 64| 2472|   0|
> |0.995|3.14|  10| 400|32| 64| 3629|   0|
> |0.944|1.34|  100|50| 32| 64| 1780|   526834|
> |0.959|1.71|  100|100|32| 64| |   0|
> |0.975|2.30|  100|200|32| 64| 3041|   0|
> |0.986|3.56|  100|400|32| 64| 4543|   0|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke opened a new pull request #2160: SOLR-15057: avoid unnecessary object retention in FacetRangeProcessor

2020-12-22 Thread GitBox



cpoerschke opened a new pull request #2160:
URL: https://github.com/apache/lucene-solr/pull/2160


   https://issues.apache.org/jira/browse/SOLR-15057



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15057) avoid unnecessary object retention in FacetRangeProcessor

2020-12-22 Thread Christine Poerschke (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christine Poerschke updated SOLR-15057:
---
Description: 
 * The (private) {{doSubs}} method is a no-op if there are no sub-facets.
 * The (private) {{intersections}} and {{filters}} arrays are only used by the 
{{doSubs}} method.
 * The (private) {{rangeStats}} method currently always populates the 
{{intersections}} and {{filters}} arrays, even when nothing actually 
subsequently uses them.
 * If {{rangeStats}} only populated the {{intersections}} array when it's 
actually needed then the {{DocSet intersection}} object would remain local in 
scope and hence the garbage collector could collect it earlier.

[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555]

  was:
(pull request to follow)
 * The (private) {{doSubs}} method is a no-op if there are no sub-facets.
 * The (private) {{intersections}} and {{filters}} arrays are only used by the 
{{doSubs}} method.
 * The (private) {{rangeStats}} method currently always populates the 
{{intersections}} and {{filters}} arrays, even when nothing actually 
subsequently uses them.
 * If {{rangeStats}} only populated the {{intersections}} array when it's 
actually needed then the {{DocSet intersection}} object would remain local in 
scope and hence the garbage collector could collect it earlier.

[https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555]


> avoid unnecessary object retention in FacetRangeProcessor
> -
>
> Key: SOLR-15057
> URL: https://issues.apache.org/jira/browse/SOLR-15057
> Project: Solr
>  Issue Type: Task
>Reporter: Christine Poerschke
>Assignee: Christine Poerschke
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
>  * The (private) {{doSubs}} method is a no-op if there are no sub-facets.
>  * The (private) {{intersections}} and {{filters}} arrays are only used by 
> the {{doSubs}} method.
>  * The (private) {{rangeStats}} method currently always populates the 
> {{intersections}} and {{filters}} arrays, even when nothing actually 
> subsequently uses them.
>  * If {{rangeStats}} only populated the {{intersections}} array when it's 
> actually needed then the {{DocSet intersection}} object would remain local in 
> scope and hence the garbage collector could collect it earlier.
> [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2153: LUCENE-9570: A non-mergable PR with reformatted code.

2020-12-22 Thread GitBox



muse-dev[bot] commented on a change in pull request #2153:
URL: https://github.com/apache/lucene-solr/pull/2153#discussion_r547326682



##
File path: lucene/core/src/test/org/apache/lucene/util/TestSmallFloat.java
##
@@ -167,22 +170,24 @@ public void testByte4() {
 }
   }
 
-  /***
-  // Do an exhaustive test of all possible floating point values
-  // for the 315 float against the original norm encoding in Similarity.
-  // Takes 75 seconds on my Pentium4 3GHz, with Java5 -server
+  @Ignore("One-time test.")
   public void testAllFloats() {
-for(int i = Integer.MIN_VALUE;;i++) {
+for (int i = Integer.MIN_VALUE; ; i++) {
   float f = Float.intBitsToFloat(i);
-  if (f==f) { // skip non-numbers
+  if (f == f) { // skip non-numbers

Review comment:
   *opt.semgrep.java.lang.correctness.eqeq.eqeq:*  `f == f` or `f != f` is 
always true. (Unless the value compared is a float or double).
   To test if `f` is not-a-number, use `Double.isNaN(f)`.
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2152: SOLR-14034: remove deprecated min_rf references

2020-12-22 Thread GitBox



cpoerschke commented on a change in pull request #2152:
URL: https://github.com/apache/lucene-solr/pull/2152#discussion_r547337314



##
File path: solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java
##
@@ -548,9 +548,6 @@ protected int sendDoc(int docId, Integer minRf, SolrClient 
solrClient, String co
 doc.addField("a_t", "hello" + docId);
 
 UpdateRequest up = new UpdateRequest();
-if (minRf != null) {
-  up.setParam(UpdateRequest.MIN_REPFACT, String.valueOf(minRf));
-}

Review comment:
   Hi @trdillon - thanks for opening this pull request!
   
   Okay, I think I see it now, so your 
https://issues.apache.org/jira/browse/SOLR-14034?focusedCommentId=17223427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17223427
 question is about this sort of test change here i.e. whether or not `minRf` 
should remain an unused `sendDoc` argument or whether or not it should be 
removed. I'm thinking here in `HttpPartitionTest` removal makes sense (haven't 
yet looked at `ReplicationFactorTest`). What do you think?

##
File path: solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java
##
@@ -461,38 +447,24 @@ protected void doDelete(UpdateRequest req, String msg, 
int expectedRf, int retri
 
   protected int sendDoc(int docId, int minRf) throws Exception {
 UpdateRequest up = new UpdateRequest();
-boolean minRfExplicit = maybeAddMinRfExplicitly(minRf, up);
 SolrInputDocument doc = new SolrInputDocument();
 doc.addField(id, String.valueOf(docId));
 doc.addField("a_t", "hello" + docId);
 up.add(doc);
-return runAndGetAchievedRf(up, minRfExplicit, minRf);
+return runAndGetAchievedRf(up, minRf);
   }
   
-  private int runAndGetAchievedRf(UpdateRequest up, boolean minRfExplicit, int 
minRf) throws SolrServerException, IOException {
+  private int runAndGetAchievedRf(UpdateRequest up, int minRf) throws 
SolrServerException, IOException {
 NamedList response = cloudClient.request(up);
-if (minRfExplicit) {
-  assertMinRfInResponse(minRf, response);
-}
 return 
cloudClient.getMinAchievedReplicationFactor(cloudClient.getDefaultCollection(), 
response);
   }
 
   private void assertMinRfInResponse(int minRf, NamedList response) {
-Object minRfFromResponse = response.findRecursive("responseHeader", 
UpdateRequest.MIN_REPFACT);
+Object minRfFromResponse = response.findRecursive("responseHeader");
 assertNotNull("Expected min_rf header in the response", minRfFromResponse);
 assertEquals("Unexpected min_rf in response", 
((Integer)minRfFromResponse).intValue(), minRf);
   }

Review comment:
   Looks like `assertMinRfInResponse` is now also unused then, if so 
suggest to remove it also.

##
File path: solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java
##
@@ -461,38 +447,24 @@ protected void doDelete(UpdateRequest req, String msg, 
int expectedRf, int retri
 
   protected int sendDoc(int docId, int minRf) throws Exception {
 UpdateRequest up = new UpdateRequest();
-boolean minRfExplicit = maybeAddMinRfExplicitly(minRf, up);
 SolrInputDocument doc = new SolrInputDocument();
 doc.addField(id, String.valueOf(docId));
 doc.addField("a_t", "hello" + docId);
 up.add(doc);
-return runAndGetAchievedRf(up, minRfExplicit, minRf);
+return runAndGetAchievedRf(up, minRf);
   }
   
-  private int runAndGetAchievedRf(UpdateRequest up, boolean minRfExplicit, int 
minRf) throws SolrServerException, IOException {

Review comment:
   How about also removing the `minRf` argument here since it's now no 
longer used?

##
File path: 
solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseCloudSolrClient.java
##
@@ -716,7 +716,6 @@ protected RouteException 
getRouteException(SolrException.ErrorCode serverError,
 if (rf == null || routeRf < rf)
   rf = routeRf;
   }
-  minRf = (Integer)header.get(UpdateRequest.MIN_REPFACT);

Review comment:
   Suggest to also remove the `Integer minRf = null;` at line 696 above.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] cpoerschke commented on pull request #2152: SOLR-14034: remove deprecated min_rf references

2020-12-22 Thread GitBox



cpoerschke commented on pull request #2152:
URL: https://github.com/apache/lucene-solr/pull/2152#issuecomment-749604771


   At 
https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L754-L756
 there's also a `Implementing min_rf here was a bit tricky. ...` comment 
reference to `min_rf` -- any thoughts on either removing it or rewording it 
somehow?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14963) Child "rows" param should apply per level

2020-12-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253585#comment-17253585
 ] 

David Smiley commented on SOLR-14963:
-

It'd be great to have something for 8.8, thus gets committed within about two 
weeks.  I'm also pushing for SOLR-14923 shortly.  Let me know if you need any 
further guidance.

> Child "rows" param should apply per level
> -
>
> Key: SOLR-14963
> URL: https://issues.apache.org/jira/browse/SOLR-14963
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Priority: Major
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The {{[child rows=10]}} doc transformer "rows" param _should_ apply per 
> parent, and it's documented this way: "The maximum number of child documents 
> to be returned per parent document.".  However, it is instead implemented as 
> an overall limit as the child documents are processed in a depth-first order 
> way.  The implementation ought to change.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-14088) Tika and commons-compress dependency in solr core causes classloader issue

2020-12-22 Thread Kevin Risden (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden resolved SOLR-14088.
-
Resolution: Cannot Reproduce

This hasn't come up again and I'm not planning to track it down any further 
right now. Marking as "Cannot Reproduce"

> Tika and commons-compress dependency in solr core causes classloader issue
> --
>
> Key: SOLR-14088
> URL: https://issues.apache.org/jira/browse/SOLR-14088
> Project: Solr
>  Issue Type: Bug
>  Components: contrib - Solr Cell (Tika extraction)
>Reporter: Kevin Risden
>Priority: Major
>
> SOLR-14086 found that if commons-compress is in core ivy.xml as a compile 
> dependency, it messes up the classloader for any commons-compress 
> dependencies. It causes issues with items like xz being loaded. 
> This is problematic where dependencies shouldn't matter based on classloader. 
> This jira to to determine if there is something wrong w/ Solr's classloader 
> or if its a commons-compress issue only.
> Error message from SOLR-14086 copied below:
> {code:java}
> 
> 
> 
> Error 500 java.lang.NoClassDefFoundError: Could not initialize class 
> org.apache.commons.compress.archivers.sevenz.Coders
> 
> HTTP ERROR 500 java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
> 
> URI:/solr/tika-integration-example/update/extract
> STATUS:500
> MESSAGE:java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
> SERVLET:default
> CAUSED BY:java.lang.NoClassDefFoundError: Could not 
> initialize class org.apache.commons.compress.archivers.sevenz.Coders
> 
> Caused by:java.lang.NoClassDefFoundError: Could not initialize 
> class org.apache.commons.compress.archivers.sevenz.Coders
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:437)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:355)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:241)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:108)
>   at 
> org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:262)
>   at 
> org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:257)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
>   at 
> org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
>   at 
> org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228)
>   at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)
>   at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
>   at org.apache.solr.core.SolrCore.execute(SolrCore.java:2582)
>   at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799)
>   at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419)
>   at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351)
>   at 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
>   at 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590)
>   at 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485)
>   at 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212)
>   at 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandle

[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-22 Thread Kevin Risden (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253596#comment-17253596
 ] 

Kevin Risden commented on SOLR-15051:
-

I can't add a comment to the design doc, but wanted to address potentially 
misleading statements around the Solr HDFS integration.

{quote}Has an unfortunate search performance penalty. TODO ___ %.  Some 
indexing penalty too: ___ %.{quote}
There will be a performance penalty here coming from remote storage. I don't 
think this is completely avoidable. The biggest issue is on the indexing side 
where we need to ensure that documents are reliably written, but this isn't 
exactly fast on remote storage.

{quote}The implementation relies on a “BlockCache”, which means running Solr 
with large Java heaps.{quote}

The BlockCache is off heap typically with Java direct memory so shouldn't 
require a large Java heap.

{quote}
It’s not a generalized shared storage scheme; it’s HDFS specific.  It’s 
possible to plug in S3 and Alluxio to this but there is overhead.  HDFS is 
rather complex to operate, whereas say S3 is provided by cloud hosting 
providers natively.
{quote}

I'm not sure I understand this statement. There are a few parts to Hadoop. HDFS 
is the storage layer that can be complex to operate. The more interesting part 
is the Hadoop filesystem interface that is a semi generic adapter between the 
HDFS API and other storage backends (S3, ABFS, GCS, etc). The two pieces are 
separate and don't require each other to operate. The Hadoop filesystem 
interface provides the abstraction necessary to go between local filesystem to 
a lot of other cloud provider storage mechanisms.

There may be some overhead there, but I know there has been a lot of work in 
the past 1-2 years where the performance has been improved since there has been 
a push for cloud storage support.

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14688) First party package implementation design

2020-12-22 Thread Jira



[ 
https://issues.apache.org/jira/browse/SOLR-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253598#comment-17253598
 ] 

Jan Høydahl commented on SOLR-14688:


So, how can we achieve these to goals?
 # Support locally/statically installed packages on each node, avoid needing 
access to a remote repo at runtime in prod, or access to ZK during image build?
 # Have each Solr node independently decide which version of a package to 
choose based on its compatibility spec, to support rolling upgrade e.g 8.7 -> 
8.8?

 

> First party package implementation design
> -
>
> Key: SOLR-14688
> URL: https://issues.apache.org/jira/browse/SOLR-14688
> Project: Solr
>  Issue Type: Improvement
>Reporter: Noble Paul
>Priority: Major
>  Labels: package, packagemanager
>
> Here's the design document for first party packages:
> https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing
> Put differently, this is about package-ifying our "contribs".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor

2020-12-22 Thread Patrick Marty (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Marty updated LUCENE-9646:
--
Summary: Set BM25Similarity discountOverlaps via the constructor  (was: Set 
BM25Similarity discountOverlaps through the constructor)

> Set BM25Similarity discountOverlaps via the constructor
> ---
>
> Key: LUCENE-9646
> URL: https://issues.apache.org/jira/browse/LUCENE-9646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Patrick Marty
>Priority: Trivial
>
> BM25Similarity discountOverlaps parameter is true by default.
> It can be set with 
> {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
> method.
> But this method makes BM25Similarity mutable.
>  
> discountOverlaps should be set via the constructor and 
> {{setDiscountOverlaps}} method should be removed to make BM25Similarity 
> immutable.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2020-12-22 Thread Ilan Ginzburg (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253612#comment-17253612
 ] 

Ilan Ginzburg commented on SOLR-15052:
--

I agree [~noble.paul]. There's no serialization, I meant to say the list of 
znode children is basically read for each replica update.

Say we get a sequence of updates for {{R1, R2, R3, R4, R5}}.
Assuming {{R1}} and {{R2}} arrive at the same time, they can be executed at the 
same time as per your example (assuming the {{DocCollection 
getPerReplicaStates()}} is up to date).
If slightly later {{R3}} and {{R4}} arrive together, each is going to see a 
changed {{cversion}} if they haven't re-read the directory listing after the 
{{R1}} & {{R2}} update. Each is going to re-read the directory before executing 
the update (done in [PerReplicaStates.fetch 
L166|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R166]
 called from [ZkController.publish 
L1622|https://github.com/apache/lucene-solr/pull/2148/files#diff-5b63503605ede4384429e74d1fa0c410adc5da8f3246e8c36e49feff2f3ea692R1622]
 before the call to [PerReplicaStates.persist 
L107|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R107]
 doing the actual [multi 
(L136)|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R136]
 operation).
Then the {{R5}} update is also going to read the directory listing and execute.

Basically, unless the {{PerReplicaStates}} stored in {{DocCollection}} is up to 
date for other reasons and new update requests arrive at exactly the same time, 
then each new replica update request triggers a new read of the directory 
listing. Updates are not serialized ({{R3}} and {{R4}} can execute in 
parallel), but there's some inefficiency in the way they're handled.

I wanted to see the actual impact of this. Based on [~ichattopadhyaya]'s test 
[StateListVsCASSpinlock.java|https://raw.githubusercontent.com/chatman/experiments/main/src/main/java/StateListVsCASSpinlock.java]
 I tried to get an idea of the costs of the different actions. With 500 
children znodes, {{getChildren}} took on my laptop about 10-15ms while 
{{getData}} on a single file with equivalent amount of text took longer at 
~20ms. This came as a surprise to me.

The multi operation (delete znode, create znode) took about 40ms while the CAS 
of the text file was faster at 30ms, but there were many retries in CAS as 
expected that considerably slowed down the process (got a speedup of over 10x 
by using the independent znodes vs a single text file with CAS with 500 
replicas).

The implementation in the PR could easily avoid systematically re-reading the 
znode children list by attempting the multi operation on the cached 
{{PerReplicaStates}} of the {{DocCollection}} (if not {{null}}). Only if the 
multi fails should it re-read the directory listing and try again. Maybe not 
worth it at this point though (but something to keep in mind).

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 3h 10m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference

[GitHub] [lucene-solr] patrickmarty opened a new pull request #2161: LUCENE-9646: Set BM25Similarity discountOverlaps via the constructor

2020-12-22 Thread GitBox



patrickmarty opened a new pull request #2161:
URL: https://github.com/apache/lucene-solr/pull/2161


   # Description
   
   BM25Similarity discountOverlaps parameter can be set with 
org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps method.
   But this method makes BM25Similarity mutable.
   
   
   # Solution
   
   discountOverlaps should be set via the constructor and setDiscountOverlaps 
method should be removed to make BM25Similarity immutable.
   
   
   # Tests
   
   all tests involving BM25Similarity and LegacyBM25Similarity have been updated
   
   # Checklist
   
   Please review the following and check all that apply:
   
   - [ ] I have reviewed the guidelines for [How to 
Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms 
to the standards described there to the best of my ability.
   - [ ] I have created a Jira issue and added the issue ID to my pull request 
title.
   - [ ] I have given Solr maintainers 
[access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork)
 to contribute to my PR branch. (optional but recommended)
   - [ ] I have developed this patch against the `master` branch.
   - [ ] I have run `./gradlew check`.
   - [ ] I have added tests for my changes.
   - [ ] I have added documentation for the [Ref 
Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) 
(for Solr changes only).
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor

2020-12-22 Thread Patrick Marty (Jira)



 [ 
https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Marty updated LUCENE-9646:
--
Description: 
BM25Similarity discountOverlaps parameter is true by default.

It can be set with 
{{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
method.

But this method makes BM25Similarity mutable.

 

discountOverlaps should be set via the constructor and {{setDiscountOverlaps}} 
method should be removed to make BM25Similarity immutable.

 

PR https://github.com/apache/lucene-solr/pull/2161

  was:
BM25Similarity discountOverlaps parameter is true by default.

It can be set with 
{{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
method.

But this method makes BM25Similarity mutable.

 

discountOverlaps should be set via the constructor and {{setDiscountOverlaps}} 
method should be removed to make BM25Similarity immutable.

 


> Set BM25Similarity discountOverlaps via the constructor
> ---
>
> Key: LUCENE-9646
> URL: https://issues.apache.org/jira/browse/LUCENE-9646
> Project: Lucene - Core
>  Issue Type: Improvement
>  Components: core/search
>Affects Versions: master (9.0)
>Reporter: Patrick Marty
>Priority: Trivial
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> BM25Similarity discountOverlaps parameter is true by default.
> It can be set with 
> {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} 
> method.
> But this method makes BM25Similarity mutable.
>  
> discountOverlaps should be set via the constructor and 
> {{setDiscountOverlaps}} method should be removed to make BM25Similarity 
> immutable.
>  
> PR https://github.com/apache/lucene-solr/pull/2161



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] risdenk closed pull request #2008: SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0

2020-12-22 Thread GitBox



risdenk closed pull request #2008:
URL: https://github.com/apache/lucene-solr/pull/2008


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253640#comment-17253640
 ] 

ASF subversion and git services commented on SOLR-14951:


Commit f0b73fdc6d8d10653a2239de3071a524310f84e2 in lucene-solr's branch 
refs/heads/master from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f0b73fd ]

SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0

Closes PR #2008


> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253670#comment-17253670
 ] 

ASF subversion and git services commented on SOLR-14951:


Commit 9e279fda14f937d7bac9721a7a5ee8d1e41f7050 in lucene-solr's branch 
refs/heads/branch_8x from Kevin Risden
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9e279fd ]

SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0

Closes PR #2008


> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0

2020-12-22 Thread Kevin Risden (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kevin Risden updated SOLR-14951:

Resolution: Fixed
Status: Resolved  (was: Patch Available)

> Upgrade Angular JS 1.7.9 to 1.8.0
> -
>
> Key: SOLR-14951
> URL: https://issues.apache.org/jira/browse/SOLR-14951
> Project: Solr
>  Issue Type: Task
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: Admin UI
>Reporter: Kevin Risden
>Assignee: Kevin Risden
>Priority: Major
> Fix For: 8.8
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> Angular JS released 1.8.0 to fix some security vulnerabilities. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253696#comment-17253696
 ] 

David Smiley commented on SOLR-15051:
-

Thanks for the look [~krisden]!

bq. There will be a performance penalty here coming from remote storage.

For read performance, if cached locally (BlobDirectory does this), there isn't. 
 BlobDirectory simply delegates reads to {{MMapDirectory}}.  I've seen 
[~tpot]'s presentation showing that {{HdfsDirectory}} has a read-time 
performance hit.

bq. The BlockCache is off heap typically with Java direct memory so shouldn't 
require a large Java heap.

Oops; thanks for the correction!

RE storage APIs/abstractions:  Are you claiming that the "Hadoop filesystem 
interface" is an ideal choice for a BlobDirectory backing store abstraction?  
BlobDirectory or whatever shared system needs to write to *something*, so I'm 
sincere in asking for your opinion on what that something should be.  I have 
basically no HDFS experience so I was unaware of it's generic API.  Even if 
that interface is nice... I suspect BlobDirectory ought to have some simple 
abstraction any way but I'm really not hung up on this choice.  I'm leery of 
adding heavy to too many dependencies.


> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-9570) Review code diffs after automatic formatting and correct problems before it is applied

2020-12-22 Thread Dawid Weiss (Jira)



[ 
https://issues.apache.org/jira/browse/LUCENE-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253719#comment-17253719
 ] 

Dawid Weiss commented on LUCENE-9570:
-

Very long string concatenations (and arithmetic expressions) get broken over 
several lines.
{code}
 if (VERBOSE_DELETES) {
-  return "gen=" + gen + " numTerms=" + numTermDeletes + ", deleteTerms=" + 
deleteTerms
-+ ", deleteQueries=" + deleteQueries + ", fieldUpdates=" + fieldUpdates
-+ ", bytesUsed=" + bytesUsed;
+  return "gen="
+  + gen
+  + " numTerms="
+  + numTermDeletes
+  + ", deleteTerms="
+  + deleteTerms
+  + ", deleteQueries="
+  + deleteQueries
+  + ", fieldUpdates="
+  + fieldUpdates
+  + ", bytesUsed="
+  + bytesUsed;
{code}
If it's semantically important, this can be manually grouped using parentheses, 
then the formatted code will preserve groups:
{code}
if (VERBOSE_DELETES) {
  return ("gen=" + gen)
  + (" numTerms=" + numTermDeletes)
  + (", deleteTerms=" + deleteTerms)
  + (", deleteQueries=" + deleteQueries)
  + (", fieldUpdates=" + fieldUpdates)
  + (", bytesUsed=" + bytesUsed);
{code}

> Review code diffs after automatic formatting and correct problems before it 
> is applied
> --
>
> Key: LUCENE-9570
> URL: https://issues.apache.org/jira/browse/LUCENE-9570
> Project: Lucene - Core
>  Issue Type: Sub-task
>Reporter: Dawid Weiss
>Assignee: Dawid Weiss
>Priority: Blocker
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Review and correct all the javadocs before they're messed up by automatic 
> formatting. Apply project-by-project, review diff, correct. Lots of diffs but 
> it should be relatively quick.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved

2020-12-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253756#comment-17253756
 ] 

David Smiley commented on SOLR-14923:
-

I pushed a branch_8x back-port to my fork which only needed a couple trivial 
changes in NestedShardedAtomicUpdateTest:
https://github.com/dsmiley/lucene-solr/commit/0f08442c7af7f171ffcb36434bd07552303dd88f
Please do run a performance test!

> Indexing performance is unacceptable when child documents are involved
> --
>
> Key: SOLR-14923
> URL: https://issues.apache.org/jira/browse/SOLR-14923
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: update, UpdateRequestProcessors
>Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0)
>Reporter: Thomas Wöckinger
>Priority: Critical
>  Labels: performance, pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Parallel indexing does not make sense at moment when child documents are used.
> The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the 
> end of the method doVersionAdd if Ulog caches should be refreshed.
> This check will return true if any child document is included in the 
> AddUpdateCommand.
> If so ulog.openRealtimeSearcher(); is called, this call is very expensive, 
> and executed in a synchronized block of the UpdateLog instance, therefore all 
> other operations on the UpdateLog are blocked too.
> Because every important UpdateLog method (add, delete, ...) is done using a 
> synchronized block almost each operation is blocked.
> This reduces multi threaded index update to a single thread behavior.
> The described behavior is not depending on any option of the UpdateRequest, 
> so it does not make any difference if 'waitFlush', 'waitSearcher' or 
> 'softCommit'  is true or false.
> The described behavior makes the usage of ChildDocuments useless, because the 
> performance is unacceptable.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob closed pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource

2020-12-22 Thread GitBox



madrob closed pull request #2118:
URL: https://github.com/apache/lucene-solr/pull/2118


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253779#comment-17253779
 ] 

ASF subversion and git services commented on SOLR-15031:


Commit 9d19a5893621766be6ffd0002ef0997da6847aa5 in lucene-solr's branch 
refs/heads/branch_8x from Pieter van Boxtel
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9d19a58 ]

SOLR-15031 Prevent null being wrapped in a QueryValueSource

closes #2118


> NPE caused by FunctionQParser returning a null ValueSource
> --
>
> Key: SOLR-15031
> URL: https://issues.apache.org/jira/browse/SOLR-15031
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pieter
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When parsing a sub query in a function query, 
> {{FunctionQParser#parseValueSource}} does not check if the produced query 
> object is null. When it is, it just wraps a null in a {{QueryValueSource}} 
> object. This is a cause for NPE's in code consuming that object. Parsed 
> queries can be null, for example when the query string only contains 
> stopwords, so we need handle that condition.
> h3. Steps to reproduce the issue
>  # Start solr with the techproducts example collection: {{solr start -e 
> techproducts}}
>  # Add a stopword to 
> SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for 
> example "at"
>  # Reload the core
>  # Execute a function query:
> {code:java}
> http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code}
> The following stacktrace is produced:
> {code:java}
> 2020-12-03 13:35:38.868 INFO  (qtp2095677157-21) [   x:techproducts] 
> o.a.s.c.S.Request [techproducts]  webapp=/solr path=/select 
> params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}}
>  status=500 QTime=34
> 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [   x:techproducts] 
> o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
> at 
> org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129)
> at 
> org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176)
> at 
> org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource

2020-12-22 Thread ASF subversion and git services (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253780#comment-17253780
 ] 

ASF subversion and git services commented on SOLR-15031:


Commit 98f12f4aeb9f6ec9f5c4de53f9faddc41043df59 in lucene-solr's branch 
refs/heads/master from Pieter van Boxtel
[ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=98f12f4 ]

SOLR-15031 Prevent null being wrapped in a QueryValueSource

closes #2118


> NPE caused by FunctionQParser returning a null ValueSource
> --
>
> Key: SOLR-15031
> URL: https://issues.apache.org/jira/browse/SOLR-15031
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pieter
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When parsing a sub query in a function query, 
> {{FunctionQParser#parseValueSource}} does not check if the produced query 
> object is null. When it is, it just wraps a null in a {{QueryValueSource}} 
> object. This is a cause for NPE's in code consuming that object. Parsed 
> queries can be null, for example when the query string only contains 
> stopwords, so we need handle that condition.
> h3. Steps to reproduce the issue
>  # Start solr with the techproducts example collection: {{solr start -e 
> techproducts}}
>  # Add a stopword to 
> SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for 
> example "at"
>  # Reload the core
>  # Execute a function query:
> {code:java}
> http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code}
> The following stacktrace is produced:
> {code:java}
> 2020-12-03 13:35:38.868 INFO  (qtp2095677157-21) [   x:techproducts] 
> o.a.s.c.S.Request [techproducts]  webapp=/solr path=/select 
> params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}}
>  status=500 QTime=34
> 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [   x:techproducts] 
> o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
> at 
> org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129)
> at 
> org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176)
> at 
> org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Resolved] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource

2020-12-22 Thread Mike Drob (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mike Drob resolved SOLR-15031.
--
Fix Version/s: master (9.0)
   8.8
 Assignee: Mike Drob
   Resolution: Fixed

Merged this in, thanks for the contribution!

> NPE caused by FunctionQParser returning a null ValueSource
> --
>
> Key: SOLR-15031
> URL: https://issues.apache.org/jira/browse/SOLR-15031
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Pieter
>Assignee: Mike Drob
>Priority: Minor
> Fix For: 8.8, master (9.0)
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> When parsing a sub query in a function query, 
> {{FunctionQParser#parseValueSource}} does not check if the produced query 
> object is null. When it is, it just wraps a null in a {{QueryValueSource}} 
> object. This is a cause for NPE's in code consuming that object. Parsed 
> queries can be null, for example when the query string only contains 
> stopwords, so we need handle that condition.
> h3. Steps to reproduce the issue
>  # Start solr with the techproducts example collection: {{solr start -e 
> techproducts}}
>  # Add a stopword to 
> SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for 
> example "at"
>  # Reload the core
>  # Execute a function query:
> {code:java}
> http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code}
> The following stacktrace is produced:
> {code:java}
> 2020-12-03 13:35:38.868 INFO  (qtp2095677157-21) [   x:techproducts] 
> o.a.s.c.S.Request [techproducts]  webapp=/solr path=/select 
> params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}}
>  status=500 QTime=34
> 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [   x:techproducts] 
> o.a.s.s.HttpSolrCall null:java.lang.NullPointerException
> at 
> org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63)
> at 
> org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129)
> at 
> org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176)
> at 
> org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53)
> at 
> org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341)
> at 
> org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Created] (LUCENE-9647) Add back ant precommit on PR for branch 8

2020-12-22 Thread Mike Drob (Jira)

Mike Drob created LUCENE-9647:
-

 Summary: Add back ant precommit on PR for branch 8
 Key: LUCENE-9647
 URL: https://issues.apache.org/jira/browse/LUCENE-9647
 Project: Lucene - Core
  Issue Type: Task
  Components: general/build
Reporter: Mike Drob


When migrating everything to gradle only, we accidentally deleted our branch_8x 
PR precommit action.

The file needs to be on master, but the branch specification should specify 
branch_8x I believe.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] dsmiley opened a new pull request #2162: SOLR-15051 Blob, DRAFT WIP

2020-12-22 Thread GitBox



dsmiley opened a new pull request #2162:
URL: https://github.com/apache/lucene-solr/pull/2162


   https://issues.apache.org/jira/browse/SOLR-15051
   
   Remember this is very WIP... just getting started here.
   
   CC @bruno-roustant  @NazerkeBS @atris 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)

2020-12-22 Thread David Smiley (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253791#comment-17253791
 ] 

David Smiley commented on SOLR-15051:
-

In Solr we have a "BackupRepository" abstraction already for backup/restore.  I 
rather like it; it's somewhat generic with a bias towards working with Lucene.  
It has "Backup" in it's name but it might as well be called something like 
"RemoteStorageRepository" that is not just usable for Backup/restore but also 
as a backing abstraction for shared storage generally.  Interestingly, I see it 
uses the Lucene {{IndexInput}} for reading (instead of an InputStream), and 
supports copying via Lucene's {{Directory}} abstraction as well, CC 
[~gerlowskija] [~atris] [~varun] [~hgadre]

> Shared storage -- BlobDirectory (de-duping)
> ---
>
> Key: SOLR-15051
> URL: https://issues.apache.org/jira/browse/SOLR-15051
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: David Smiley
>Assignee: David Smiley
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> This proposal is a way to accomplish shared storage in SolrCloud with a few 
> key characteristics: (A) using a Directory implementation, (B) delegates to a 
> backing local file Directory as a kind of read/write cache (C) replicas have 
> their own "space", (D) , de-duplication across replicas via reference 
> counting, (E) uses ZK but separately from SolrCloud stuff.
> The Directory abstraction is a good one, and helps isolate shared storage 
> from the rest of SolrCloud that doesn't care.  Using a backing normal file 
> Directory is faster for reads and is simpler than Solr's HDFSDirectory's 
> BlockCache.  Replicas having their own space solves the problem of multiple 
> writers (e.g. of the same shard) trying to own and write to the same space, 
> and it implies that any of Solr's replica types can be used along with what 
> goes along with them like peer-to-peer replication (sometimes faster/cheaper 
> than pulling from shared storage).  A de-duplication feature solves needless 
> duplication of files across replicas and from parent shards (i.e. from shard 
> splitting).  The de-duplication feature requires a place to cache directory 
> listings so that they can be shared across replicas and atomically updated; 
> this is handled via ZooKeeper.  Finally, some sort of Solr daemon / 
> auto-scaling code should be added to implement "autoAddReplicas", especially 
> to provide for a scenario where the leader is gone and can't be replicated 
> from directly but we can access shared storage.
> For more about shared storage concepts, consider looking at the description 
> in SOLR-13101 and the linked Google Doc.
> *[PROPOSAL 
> DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob opened a new pull request #2163: LUCENE-9647 Add back github action for Ant

2020-12-22 Thread GitBox



madrob opened a new pull request #2163:
URL: https://github.com/apache/lucene-solr/pull/2163


   https://issues.apache.org/jira/browse/LUCENE-9647
   
   Tagging @tflobbe for review because you removed the file in the first place. 
If there's a better place to put it, let me know.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks

2020-12-22 Thread GitBox



madrob commented on a change in pull request #2148:
URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547542479



##
File path: 
solr/test-framework/src/java/org/apache/solr/cloud/SolrCloudTestCase.java
##
@@ -81,6 +81,7 @@
 public class SolrCloudTestCase extends SolrTestCaseJ4 {
 
   private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  public static final Boolean USE_PER_REPLICA_STATE = 
Boolean.parseBoolean(System.getProperty("use.per-replica", "false"));

Review comment:
   Can it be a setting in cluster properties that controls how new 
collections are created?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks

2020-12-22 Thread GitBox



madrob commented on a change in pull request #2148:
URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547542705



##
File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java
##
@@ -1609,12 +1611,41 @@ public void publish(final CoreDescriptor cd, final 
Replica.State state, boolean
   if (updateLastState) {
 cd.getCloudDescriptor().setLastPublished(state);
   }
-  overseerJobQueue.offer(Utils.toJSON(m));
+  DocCollection coll = zkStateReader.getCollection(collection);
+  if (forcePublish || sendToOverseer(coll, coreNodeName)) {

Review comment:
   some aspects of state stored in the old place, some aspects stored in 
the new place. I'm still working on building a full mental model, so maybe this 
is a wrong question.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks

2020-12-22 Thread GitBox



madrob commented on a change in pull request #2148:
URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547543008



##
File path: 
solr/solrj/src/java/org/apache/solr/common/cloud/PerReplicaStates.java
##
@@ -0,0 +1,587 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.solr.common.cloud;
+
+import java.io.IOException;
+import java.lang.invoke.MethodHandles;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.LinkedHashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Objects;
+import java.util.Set;
+import java.util.function.BiConsumer;
+
+import org.apache.solr.cluster.api.SimpleMap;
+import org.apache.solr.common.MapWriter;
+import org.apache.solr.common.SolrException;
+import org.apache.solr.common.annotation.JsonProperty;
+import org.apache.solr.common.util.ReflectMapWriter;
+import org.apache.solr.common.util.StrUtils;
+import org.apache.solr.common.util.WrappedSimpleMap;
+import org.apache.zookeeper.CreateMode;
+import org.apache.zookeeper.KeeperException;
+import org.apache.zookeeper.data.ACL;
+import org.apache.zookeeper.data.Stat;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static java.util.Collections.singletonList;
+import static org.apache.solr.common.params.CommonParams.NAME;
+import static org.apache.solr.common.params.CommonParams.VERSION;
+
+/**
+ * This represents the individual replica states in a collection
+ * This is an immutable object. When states are modified, a new instance is 
constructed
+ */
+public class PerReplicaStates implements ReflectMapWriter {
+  private static final Logger log = 
LoggerFactory.getLogger(MethodHandles.lookup().lookupClass());
+  public static final char SEPARATOR = ':';
+
+
+  @JsonProperty
+  public final String path;
+
+  @JsonProperty
+  public final int cversion;
+
+  @JsonProperty
+  public final SimpleMap states;
+
+  public PerReplicaStates(String path, int cversion, List states) {
+this.path = path;
+this.cversion = cversion;
+Map tmp = new LinkedHashMap<>();
+
+for (String state : states) {
+  State rs = State.parse(state);
+  if (rs == null) continue;
+  State existing = tmp.get(rs.replica);
+  if (existing == null) {
+tmp.put(rs.replica, rs);
+  } else {
+tmp.put(rs.replica, rs.insert(existing));
+  }
+}
+this.states = new WrappedSimpleMap<>(tmp);
+
+  }
+
+  /**Get the changed replicas
+   */
+  public static Set findModifiedReplicas(PerReplicaStates old, 
PerReplicaStates fresh) {
+Set result = new HashSet<>();
+if (fresh == null) {
+  old.states.forEachKey(result::add);
+  return result;
+}
+old.states.forEachEntry((s, state) -> {
+  // the state is modified or missing
+  if (!Objects.equals(fresh.get(s) , state)) result.add(s);
+});
+fresh.states.forEachEntry((s, state) -> { if (old.get(s) == null ) 
result.add(s);
+});
+return result;
+  }
+
+  /**
+   * This is a persist operation with retry if a write fails due to stale state
+   */
+  public static void persist(WriteOps ops, String znode, SolrZkClient 
zkClient) throws KeeperException, InterruptedException {
+try {
+  persist(ops.get(), znode, zkClient);
+} catch (KeeperException.NodeExistsException | 
KeeperException.NoNodeException e) {
+  //state is stale
+  log.info("stale state for {} . retrying...", znode);
+  List freshOps = ops.get(PerReplicaStates.fetch(znode, zkClient, 
null));
+  persist(freshOps, znode, zkClient);
+  log.info("retried for stale state {}, succeeded", znode);
+}
+  }
+
+  /**
+   * Persist a set of operations to Zookeeper
+   */
+  public static void persist(List operations, String znode, SolrZkClient 
zkClient) throws KeeperException, InterruptedException {
+if (operations == null || operations.isEmpty()) return;
+log.debug("Per-replica state being persisted for :{}, ops: {}", znode, 
operations);
+
+List ops = new ArrayList<>(operations.size());
+for (Op op : operations) {
+  //the state of the replica is being updated
+  String path = znode + "/" + op.state.asString;
+  List acl

[GitHub] [lucene-solr] trdillon commented on a change in pull request #2152: SOLR-14034: remove deprecated min_rf references

2020-12-22 Thread GitBox



trdillon commented on a change in pull request #2152:
URL: https://github.com/apache/lucene-solr/pull/2152#discussion_r547561917



##
File path: solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java
##
@@ -548,9 +548,6 @@ protected int sendDoc(int docId, Integer minRf, SolrClient 
solrClient, String co
 doc.addField("a_t", "hello" + docId);
 
 UpdateRequest up = new UpdateRequest();
-if (minRf != null) {
-  up.setParam(UpdateRequest.MIN_REPFACT, String.valueOf(minRf));
-}

Review comment:
   @cpoerschke Thanks for all your help with this one.
   
   `sendDoc` seems OK to remove from `HttpPartitionTest` as it's only passed 
`minRf` as an argument, but in `ReplicationFactorTest` it is passed 
`expectedRf` as an argument in `addDocs`:
   
https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L417
   
   `sendDocsWithRetry`  is implemented here and used in a few different test 
classes (`DistributedVersionInfoTest`, `LeaderFailoverAfterPartitionTest` and 
`ForceLeaderTest`) with `minRf` as an argument:
   
https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/test-framework/src/java/org/apache/solr/cloud/AbstractFullDistribZkTestBase.java#L941
   
   But it's also implemented using `expectedRfDBQ` and `expectedRf` in 
`ReplicationFactorTest`:
   
   - 
https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L230
   - 
https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L427
   
   I wasn't sure how to proceed with this as looking at the tests it seems that 
`expectedRf` is relied upon quite often.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Updated] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2020-12-22 Thread Noble Paul (Jira)



 [ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-15052:
--
Issue Type: Improvement  (was: Bug)

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states

2020-12-22 Thread Noble Paul (Jira)



[ 
https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253820#comment-17253820
 ] 

Noble Paul commented on SOLR-15052:
---

{quote}Then the {{R5}} update is also going to read the directory listing and 
execute.
{quote}
R5 would have gotten a callback and it would've updated the per-replica-states 
anyway. So, all that we are doing is an extra {{stat}} read , which is 
extremely cheap.
{quote}With 500 children znodes, getChildren took on my laptop about 10-15ms 
while getData on a single file with equivalent amount of text took longer at 
~20ms. This came as a surprise to me.
{quote}
Reads are not such a big deal. Even writes are not a big deal. But, CAS writes 
are a big deal. We would like to minimize contention while doing CAS writes.
{quote}The multi operation (delete znode, create znode) took about 40ms while 
the CAS of the text file was faster at 30ms,
{quote}
CAS in itself is not slow. As the no:of of parallel writes grow, the 
performance degrades dramatically. If you have 1000's of replicas trying to 
update using CAS, the performance is going to be unacceptably low. Where as, 
the {{multi}} approach on individual nodes will perform same irrespective of 
whether we have 2 replicas or 2 replicas.
{quote}The implementation in the PR could easily avoid systematically 
re-reading the znode children list by attempting the multi operation on the 
cached PerReplicaStates of the DocCollection
{quote}
It already uses the cached data. Yes, it does an extra version check, but 
that's cheap

> Reducing overseer bottlenecks using per-replica states
> --
>
> Key: SOLR-15052
> URL: https://issues.apache.org/jira/browse/SOLR-15052
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>Reporter: Ishan Chattopadhyaya
>Priority: Major
> Attachments: per-replica-states-gcp.pdf
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> This work has the same goal as SOLR-13951, that is to reduce overseer 
> bottlenecks by avoiding replica state updates from going to the state.json 
> via the overseer. However, the approach taken here is different from 
> SOLR-13951 and hence this work supercedes that work.
> The design proposed is here: 
> https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit
> Briefly,
> # Every replica's state will be in a separate znode nested under the 
> state.json. It has the name that encodes the replica name, state, leadership 
> status.
> # An additional children watcher to be set on state.json for state changes.
> # Upon a state change, a ZK multi-op to delete the previous znode and add a 
> new znode with new state.
> Differences between this and SOLR-13951,
> # In SOLR-13951, we planned to leverage shard terms for per shard states.
> # As a consequence, the code changes required for SOLR-13951 were massive (we 
> needed a shard state provider abstraction and introduce it everywhere in the 
> codebase).
> # This approach is a drastically simpler change and design.
> Credits for this design and the PR is due to [~noble.paul]. 
> [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this 
> effort. The reference branch takes a conceptually similar (but not identical) 
> approach.
> I shall attach a PR and performance benchmarks shortly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] tflobbe commented on pull request #2163: LUCENE-9647 Add back github action for Ant

2020-12-22 Thread GitBox



tflobbe commented on pull request #2163:
URL: https://github.com/apache/lucene-solr/pull/2163#issuecomment-749862991


   Does this go yo master or to 8.x branch? I don't think we had it running in 
8.x before?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

47 matches

Mail list logo