[GitHub] [lucene-solr] iverase commented on a change in pull request #2155: LUCENE-9641: Support for spatial relationships in LatLonPoint
iverase commented on a change in pull request #2155: URL: https://github.com/apache/lucene-solr/pull/2155#discussion_r547165818 ## File path: lucene/core/src/java/org/apache/lucene/document/LatLonPointQuery.java ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.geo.Component2D; +import org.apache.lucene.geo.GeoEncodingUtils; +import org.apache.lucene.geo.LatLonGeometry; +import org.apache.lucene.geo.Line; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.util.NumericUtils; + +import java.util.Arrays; +import java.util.function.Function; +import java.util.function.Predicate; + +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLongitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLongitude; + +/** + * Finds all previously indexed geo points that comply the given {@link QueryRelation} with + * the specified array of {@link LatLonGeometry}. + * + * The field must be indexed using one or more {@link LatLonPoint} added per document. + * + **/ +final class LatLonPointQuery extends SpatialQuery { + final private LatLonGeometry[] geometries; + final private Component2D component2D; + + /** + * Creates a query that matches all indexed shapes to the provided array of {@link LatLonGeometry} + */ + LatLonPointQuery(String field, QueryRelation queryRelation, LatLonGeometry... geometries) { +super(field, queryRelation); +if (queryRelation == QueryRelation.WITHIN) { + for (LatLonGeometry geometry : geometries) { +if (geometry instanceof Line) { + // TODO: line queries do not support within relations + throw new IllegalArgumentException("LatLonPointQuery does not support " + QueryRelation.WITHIN + " queries with line geometries"); +} + } +} +this.component2D = LatLonGeometry.create(geometries); +this.geometries = geometries.clone(); + } + + @Override + protected SpatialVisitor getSpatialVisitor() { +if (component2D.getMinY() > component2D.getMaxY()) { + // encodeLatitudeCeil may cause minY to be > maxY iff + // the delta between the longitude < the encoding resolution + return EMPTYVISITOR; Review comment: The problem was in the way the component predicate is built. I have another thought and I think I push a better solution in 1cebeff This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] iverase commented on a change in pull request #2155: LUCENE-9641: Support for spatial relationships in LatLonPoint
iverase commented on a change in pull request #2155: URL: https://github.com/apache/lucene-solr/pull/2155#discussion_r547166564 ## File path: lucene/core/src/java/org/apache/lucene/document/LatLonPointQuery.java ## @@ -0,0 +1,174 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.document; + +import org.apache.lucene.document.ShapeField.QueryRelation; +import org.apache.lucene.geo.Component2D; +import org.apache.lucene.geo.GeoEncodingUtils; +import org.apache.lucene.geo.LatLonGeometry; +import org.apache.lucene.geo.Line; +import org.apache.lucene.index.PointValues.Relation; +import org.apache.lucene.util.NumericUtils; + +import java.util.Arrays; +import java.util.function.Function; +import java.util.function.Predicate; + +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.decodeLongitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLatitude; +import static org.apache.lucene.geo.GeoEncodingUtils.encodeLongitude; + +/** + * Finds all previously indexed geo points that comply the given {@link QueryRelation} with + * the specified array of {@link LatLonGeometry}. + * + * The field must be indexed using one or more {@link LatLonPoint} added per document. + * + **/ +final class LatLonPointQuery extends SpatialQuery { + final private LatLonGeometry[] geometries; + final private Component2D component2D; + + /** + * Creates a query that matches all indexed shapes to the provided array of {@link LatLonGeometry} + */ + LatLonPointQuery(String field, QueryRelation queryRelation, LatLonGeometry... geometries) { +super(field, queryRelation); +if (queryRelation == QueryRelation.WITHIN) { + for (LatLonGeometry geometry : geometries) { +if (geometry instanceof Line) { + // TODO: line queries do not support within relations + throw new IllegalArgumentException("LatLonPointQuery does not support " + QueryRelation.WITHIN + " queries with line geometries"); +} + } +} +this.component2D = LatLonGeometry.create(geometries); +this.geometries = geometries.clone(); + } + + @Override + protected SpatialVisitor getSpatialVisitor() { +if (component2D.getMinY() > component2D.getMaxY()) { + // encodeLatitudeCeil may cause minY to be > maxY iff + // the delta between the longitude < the encoding resolution + return EMPTYVISITOR; +} +final GeoEncodingUtils.Component2DPredicate component2DPredicate = GeoEncodingUtils.createComponentPredicate(component2D); +// bounding box over all geometries, this can speed up tree intersection/cheaply improve approximation for complex multi-geometries +final byte[] minLat = new byte[Integer.BYTES]; +final byte[] maxLat = new byte[Integer.BYTES]; +final byte[] minLon = new byte[Integer.BYTES]; +final byte[] maxLon = new byte[Integer.BYTES]; +NumericUtils.intToSortableBytes(encodeLatitude(component2D.getMinY()), minLat, 0); +NumericUtils.intToSortableBytes(encodeLatitude(component2D.getMaxY()), maxLat, 0); +NumericUtils.intToSortableBytes(encodeLongitude(component2D.getMinX()), minLon, 0); +NumericUtils.intToSortableBytes(encodeLongitude(component2D.getMaxX()), maxLon, 0); + +return new SpatialVisitor() { + @Override + protected Relation relate(byte[] minPackedValue, byte[] maxPackedValue) { +if (Arrays.compareUnsigned(minPackedValue, 0, Integer.BYTES, maxLat, 0, Integer.BYTES) > 0 || +Arrays.compareUnsigned(maxPackedValue, 0, Integer.BYTES, minLat, 0, Integer.BYTES) < 0 || +Arrays.compareUnsigned(minPackedValue, Integer.BYTES, Integer.BYTES + Integer.BYTES, maxLon, 0, Integer.BYTES) > 0 || +Arrays.compareUnsigned(maxPackedValue, Integer.BYTES, Integer.BYTES + Integer.BYTES, minLon, 0, Integer.BYTES) < 0) { + // outside of global bounding box range + return Relation.CELL_OUTSIDE_QUERY; +} + +double cellMinLat = decodeLatitude(minPackedValue, 0); +double cellMinLon = decodeLongitude(minPackedValue, Integer.BYTES); +double cellMaxLat = decodeLati
[GitHub] [lucene-solr] murblanc commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks
murblanc commented on a change in pull request #2148: URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547236069 ## File path: solr/core/src/java/org/apache/solr/cloud/overseer/CollectionMutator.java ## @@ -136,8 +154,13 @@ public ZkWriteCommand modifyCollection(final ClusterState clusterState, ZkNodePr return ZkStateWriter.NO_OP; } -return new ZkWriteCommand(coll.getName(), -new DocCollection(coll.getName(), coll.getSlicesMap(), m, coll.getRouter(), coll.getZNodeVersion(), coll.getZNode())); +DocCollection collection = new DocCollection(coll.getName(), coll.getSlicesMap(), m, coll.getRouter(), coll.getZNodeVersion(), coll.getZNode()); +if (replicaOps == null){ + return new ZkWriteCommand(coll.getName(), collection); +} else { + return new ZkWriteCommand(coll.getName(), collection, replicaOps, true); +} Review comment: My bad, the two are different. When `replicaOps` is `null`, `ops` in `ZkWriteCommand` is set to `PerReplicaStates.WriteOps.touchChildren()`. When `replicaOps` is not `null`, `ops` in `ZkWriteCommand` is set to `replicaOps`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] gerlowskija merged pull request #2146: SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins'
gerlowskija merged pull request #2146: URL: https://github.com/apache/lucene-solr/pull/2146 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"
[ https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253501#comment-17253501 ] ASF subversion and git services commented on SOLR-15049: Commit 8b272a0960b619664ae9abe4ea2812330f0b2d5d in lucene-solr's branch refs/heads/master from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=8b272a0 ] SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins' (#2146) > Optimize TopLevelJoinQuery "self joins" > --- > > Key: SOLR-15049 > URL: https://issues.apache.org/jira/browse/SOLR-15049 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > The "TopLevelJoinQuery" join implementation added recently performs well in a > variety of circumstances. However we can improve the performance further by > adding logic to handle the special case where fields being joined are the > same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}). > Currently this self-join usecase still does a lot of work to convert "from" > ordinals into "to" ordinals. If we make the query (or QParserPlugin?) smart > enough to recognize this special case we can skip this step and improve > performance greatly in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-7344) Deletion by query of uncommitted docs not working with DV updates
[ https://issues.apache.org/jira/browse/LUCENE-7344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253513#comment-17253513 ] David Smiley commented on LUCENE-7344: -- I know it's been 4 years but I want to add that a Lucene Query has a Weight with a relatively new method, {{isCacheable}} that is a reasonable approximation for wether Solr DBQ processing could skip the realtimeSearcher reopen. This method must return false if there are any DocValue dependent clauses in the Query. It can also return false in some other cases (e.g. BooleanQuery with many clauses) but usually it will not. > Deletion by query of uncommitted docs not working with DV updates > - > > Key: LUCENE-7344 > URL: https://issues.apache.org/jira/browse/LUCENE-7344 > Project: Lucene - Core > Issue Type: Bug >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: LUCENE-7344.patch, LUCENE-7344.patch, LUCENE-7344.patch, > LUCENE-7344.patch, LUCENE-7344.patch, LUCENE-7344.patch, > LUCENE-7344_hoss_experiment.patch > > > When DVs are updated, delete by query doesn't work with the updated DV value. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"
[ https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253528#comment-17253528 ] ASF subversion and git services commented on SOLR-15049: Commit 7f8e260c124e94ddd5033ee30249cf56f813729e in lucene-solr's branch refs/heads/branch_8x from Jason Gerlowski [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=7f8e260 ] SOLR-15049: Add TopLevelJoinQuery optimization for 'self-joins' > Optimize TopLevelJoinQuery "self joins" > --- > > Key: SOLR-15049 > URL: https://issues.apache.org/jira/browse/SOLR-15049 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Time Spent: 0.5h > Remaining Estimate: 0h > > The "TopLevelJoinQuery" join implementation added recently performs well in a > variety of circumstances. However we can improve the performance further by > adding logic to handle the special case where fields being joined are the > same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}). > Currently this self-join usecase still does a lot of work to convert "from" > ordinals into "to" ordinals. If we make the query (or QParserPlugin?) smart > enough to recognize this special case we can skip this step and improve > performance greatly in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15049) Optimize TopLevelJoinQuery "self joins"
[ https://issues.apache.org/jira/browse/SOLR-15049?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gerlowski resolved SOLR-15049. Fix Version/s: master (9.0) 8.8 Resolution: Fixed > Optimize TopLevelJoinQuery "self joins" > --- > > Key: SOLR-15049 > URL: https://issues.apache.org/jira/browse/SOLR-15049 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Components: query parsers >Affects Versions: master (9.0) >Reporter: Jason Gerlowski >Assignee: Jason Gerlowski >Priority: Minor > Fix For: 8.8, master (9.0) > > Time Spent: 0.5h > Remaining Estimate: 0h > > The "TopLevelJoinQuery" join implementation added recently performs well in a > variety of circumstances. However we can improve the performance further by > adding logic to handle the special case where fields being joined are the > same (e.g. {{fromIndex}} is null and {{fromField}} == {{toField}}). > Currently this self-join usecase still does a lot of work to convert "from" > ordinals into "to" ordinals. If we make the query (or QParserPlugin?) smart > enough to recognize this special case we can skip this step and improve > performance greatly in this case. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] msokolov merged pull request #2157: LUCENE-9644: diversity heuristic for HNSW graph neighbor selection
msokolov merged pull request #2157: URL: https://github.com/apache/lucene-solr/pull/2157 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (SOLR-15057) avoid unnecessary object retention in FacetRangeProcessor
Christine Poerschke created SOLR-15057: -- Summary: avoid unnecessary object retention in FacetRangeProcessor Key: SOLR-15057 URL: https://issues.apache.org/jira/browse/SOLR-15057 Project: Solr Issue Type: Task Reporter: Christine Poerschke Assignee: Christine Poerschke (pull request to follow) * The (private) {{doSubs}} method is a no-op if there are no sub-facets. * The (private) {{intersections}} and {{filters}} arrays are only used by the {{doSubs}} method. * The (private) {{rangeStats}} method currently always populates the {{intersections}} and {{filters}} arrays, even when nothing actually subsequently uses them. * If {{rangeStats}} only populated the {{intersections}} array when it's actually needed then the {{DocSet intersection}} object would remain local in scope and hence the garbage collector could collect it earlier. [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9644) HNSW diverse neighbor selection heuristic
[ https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253546#comment-17253546 ] ASF subversion and git services commented on LUCENE-9644: - Commit e1cd426bce39abc4345b748d9cff5ff7fe10315f in lucene-solr's branch refs/heads/master from Michael Sokolov [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=e1cd426 ] LUCENE-9644: diversity heuristic for HNSW graph neighbor selection (#2157) * Additional options to KnnGraphTester to support benchmarking with ann-benchmarks * switch to parallel array-based storage in HnswGraph (was using LongHeap) > HNSW diverse neighbor selection heuristic > - > > Key: LUCENE-9644 > URL: https://issues.apache.org/jira/browse/LUCENE-9644 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This will replace the simple nearest neighbor selection with a criterion that > takes into account the distance of the neighbors from each other. It is seen > to provide dramatically improved recall on at least two datasets, and is what > is being used by our reference implementation, hnswlib. The basic idea is > that when selecting the nearest neighbors to associate with a new node added > to the graph, we filter using a diversity criterion. If a candidate neighbor > is closer to an already-added (closer to the new node) neighbor than it is to > the new node, then we pass over it, moving on to more-distant, but presumably > more diverse neighbors. The same criterion is also (re-) applied to the > neighbor nodes' neighbors, since we add the links bidirectionally. > h2. Results: > h3. GloVe/Wikipedia > h4. baseline > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.643|0.77| 10| 50| 32| 64| 1742| 22830| > |0.671|0.95| 10| 100|32| 64| 2141| 0| > |0.704|1.32| 10| 200|32| 64| 2923| 0| > |0.739|2.04| 10| 400|32| 64| 4382| 0| > |0.470|0.91| 100|50| 32| 64| 2068| 337081| > |0.496|1.21| 100|100|32| 64| 2548| 0| > |0.533|1.77| 100|200|32| 64| 3479| 0| > |0.573|2.58| 100|400|32| 64| 5257| 0| > h4. diverse > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.801|0.57| 10| 50| 32| 64| 593|17985| > |0.840|0.67| 10| 100|32| 64| 738|0| > |0.883|0.97| 10| 200|32| 64| 1018| 0| > |0.921|1.36| 10| 400|32| 64| 1502| 0| > |0.723|0.71| 100|50| 32| 64| 860|298383| > |0.761|0.77| 100|100|32| 64| 1058| 0| > |0.806|1.06| 100|200|32| 64| 1442| 0| > |0.854|1.67| 100|400|32| 64| 2159| 0| > h3. Dataset from work: > h4. baseline > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.933|1.41| 10| 50| 32| 64| 1496| 35462| > |0.948|1.39| 10| 100|32| 64| 1872| 0| > |0.961|2.10| 10| 200|32| 64| 2591| 0| > |0.972|3.04| 10| 400|32| 64| 3939| 0| > |0.827|1.34| 100|50| 32| 64| 1676| 535802| > |0.854|1.76| 100|100|32| 64| 2056| 0| > |0.887|2.47| 100|200|32| 64| 2761| 0| > |0.907|3.75| 100|400|32| 64| 4129| 0| > h4. diverse > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.966|1.18| 10| 50| 32| 64| 1480| 37656| > |0.977|1.46| 10| 100|32| 64| 1832| 0| > |0.988|2.00| 10| 200|32| 64| 2472| 0| > |0.995|3.14| 10| 400|32| 64| 3629| 0| > |0.944|1.34| 100|50| 32| 64| 1780| 526834| > |0.959|1.71| 100|100|32| 64| | 0| > |0.975|2.30| 100|200|32| 64| 3041| 0| > |0.986|3.56| 100|400|32| 64| 4543| 0| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-9644) HNSW diverse neighbor selection heuristic
[ https://issues.apache.org/jira/browse/LUCENE-9644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Sokolov resolved LUCENE-9644. - Resolution: Fixed > HNSW diverse neighbor selection heuristic > - > > Key: LUCENE-9644 > URL: https://issues.apache.org/jira/browse/LUCENE-9644 > Project: Lucene - Core > Issue Type: Improvement >Reporter: Michael Sokolov >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > This will replace the simple nearest neighbor selection with a criterion that > takes into account the distance of the neighbors from each other. It is seen > to provide dramatically improved recall on at least two datasets, and is what > is being used by our reference implementation, hnswlib. The basic idea is > that when selecting the nearest neighbors to associate with a new node added > to the graph, we filter using a diversity criterion. If a candidate neighbor > is closer to an already-added (closer to the new node) neighbor than it is to > the new node, then we pass over it, moving on to more-distant, but presumably > more diverse neighbors. The same criterion is also (re-) applied to the > neighbor nodes' neighbors, since we add the links bidirectionally. > h2. Results: > h3. GloVe/Wikipedia > h4. baseline > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.643|0.77| 10| 50| 32| 64| 1742| 22830| > |0.671|0.95| 10| 100|32| 64| 2141| 0| > |0.704|1.32| 10| 200|32| 64| 2923| 0| > |0.739|2.04| 10| 400|32| 64| 4382| 0| > |0.470|0.91| 100|50| 32| 64| 2068| 337081| > |0.496|1.21| 100|100|32| 64| 2548| 0| > |0.533|1.77| 100|200|32| 64| 3479| 0| > |0.573|2.58| 100|400|32| 64| 5257| 0| > h4. diverse > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.801|0.57| 10| 50| 32| 64| 593|17985| > |0.840|0.67| 10| 100|32| 64| 738|0| > |0.883|0.97| 10| 200|32| 64| 1018| 0| > |0.921|1.36| 10| 400|32| 64| 1502| 0| > |0.723|0.71| 100|50| 32| 64| 860|298383| > |0.761|0.77| 100|100|32| 64| 1058| 0| > |0.806|1.06| 100|200|32| 64| 1442| 0| > |0.854|1.67| 100|400|32| 64| 2159| 0| > h3. Dataset from work: > h4. baseline > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.933|1.41| 10| 50| 32| 64| 1496| 35462| > |0.948|1.39| 10| 100|32| 64| 1872| 0| > |0.961|2.10| 10| 200|32| 64| 2591| 0| > |0.972|3.04| 10| 400|32| 64| 3939| 0| > |0.827|1.34| 100|50| 32| 64| 1676| 535802| > |0.854|1.76| 100|100|32| 64| 2056| 0| > |0.887|2.47| 100|200|32| 64| 2761| 0| > |0.907|3.75| 100|400|32| 64| 4129| 0| > h4. diverse > ||recall||latency ms||nDoc||fanout|| maxConn|| beamWidth|| visited|| index > ms|| > |0.966|1.18| 10| 50| 32| 64| 1480| 37656| > |0.977|1.46| 10| 100|32| 64| 1832| 0| > |0.988|2.00| 10| 200|32| 64| 2472| 0| > |0.995|3.14| 10| 400|32| 64| 3629| 0| > |0.944|1.34| 100|50| 32| 64| 1780| 526834| > |0.959|1.71| 100|100|32| 64| | 0| > |0.975|2.30| 100|200|32| 64| 3041| 0| > |0.986|3.56| 100|400|32| 64| 4543| 0| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke opened a new pull request #2160: SOLR-15057: avoid unnecessary object retention in FacetRangeProcessor
cpoerschke opened a new pull request #2160: URL: https://github.com/apache/lucene-solr/pull/2160 https://issues.apache.org/jira/browse/SOLR-15057 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15057) avoid unnecessary object retention in FacetRangeProcessor
[ https://issues.apache.org/jira/browse/SOLR-15057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christine Poerschke updated SOLR-15057: --- Description: * The (private) {{doSubs}} method is a no-op if there are no sub-facets. * The (private) {{intersections}} and {{filters}} arrays are only used by the {{doSubs}} method. * The (private) {{rangeStats}} method currently always populates the {{intersections}} and {{filters}} arrays, even when nothing actually subsequently uses them. * If {{rangeStats}} only populated the {{intersections}} array when it's actually needed then the {{DocSet intersection}} object would remain local in scope and hence the garbage collector could collect it earlier. [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555] was: (pull request to follow) * The (private) {{doSubs}} method is a no-op if there are no sub-facets. * The (private) {{intersections}} and {{filters}} arrays are only used by the {{doSubs}} method. * The (private) {{rangeStats}} method currently always populates the {{intersections}} and {{filters}} arrays, even when nothing actually subsequently uses them. * If {{rangeStats}} only populated the {{intersections}} array when it's actually needed then the {{DocSet intersection}} object would remain local in scope and hence the garbage collector could collect it earlier. [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555] > avoid unnecessary object retention in FacetRangeProcessor > - > > Key: SOLR-15057 > URL: https://issues.apache.org/jira/browse/SOLR-15057 > Project: Solr > Issue Type: Task >Reporter: Christine Poerschke >Assignee: Christine Poerschke >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > * The (private) {{doSubs}} method is a no-op if there are no sub-facets. > * The (private) {{intersections}} and {{filters}} arrays are only used by > the {{doSubs}} method. > * The (private) {{rangeStats}} method currently always populates the > {{intersections}} and {{filters}} arrays, even when nothing actually > subsequently uses them. > * If {{rangeStats}} only populated the {{intersections}} array when it's > actually needed then the {{DocSet intersection}} object would remain local in > scope and hence the garbage collector could collect it earlier. > [https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/search/facet/FacetRangeProcessor.java#L531-L555] -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] muse-dev[bot] commented on a change in pull request #2153: LUCENE-9570: A non-mergable PR with reformatted code.
muse-dev[bot] commented on a change in pull request #2153: URL: https://github.com/apache/lucene-solr/pull/2153#discussion_r547326682 ## File path: lucene/core/src/test/org/apache/lucene/util/TestSmallFloat.java ## @@ -167,22 +170,24 @@ public void testByte4() { } } - /*** - // Do an exhaustive test of all possible floating point values - // for the 315 float against the original norm encoding in Similarity. - // Takes 75 seconds on my Pentium4 3GHz, with Java5 -server + @Ignore("One-time test.") public void testAllFloats() { -for(int i = Integer.MIN_VALUE;;i++) { +for (int i = Integer.MIN_VALUE; ; i++) { float f = Float.intBitsToFloat(i); - if (f==f) { // skip non-numbers + if (f == f) { // skip non-numbers Review comment: *opt.semgrep.java.lang.correctness.eqeq.eqeq:* `f == f` or `f != f` is always true. (Unless the value compared is a float or double). To test if `f` is not-a-number, use `Double.isNaN(f)`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on a change in pull request #2152: SOLR-14034: remove deprecated min_rf references
cpoerschke commented on a change in pull request #2152: URL: https://github.com/apache/lucene-solr/pull/2152#discussion_r547337314 ## File path: solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java ## @@ -548,9 +548,6 @@ protected int sendDoc(int docId, Integer minRf, SolrClient solrClient, String co doc.addField("a_t", "hello" + docId); UpdateRequest up = new UpdateRequest(); -if (minRf != null) { - up.setParam(UpdateRequest.MIN_REPFACT, String.valueOf(minRf)); -} Review comment: Hi @trdillon - thanks for opening this pull request! Okay, I think I see it now, so your https://issues.apache.org/jira/browse/SOLR-14034?focusedCommentId=17223427&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17223427 question is about this sort of test change here i.e. whether or not `minRf` should remain an unused `sendDoc` argument or whether or not it should be removed. I'm thinking here in `HttpPartitionTest` removal makes sense (haven't yet looked at `ReplicationFactorTest`). What do you think? ## File path: solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java ## @@ -461,38 +447,24 @@ protected void doDelete(UpdateRequest req, String msg, int expectedRf, int retri protected int sendDoc(int docId, int minRf) throws Exception { UpdateRequest up = new UpdateRequest(); -boolean minRfExplicit = maybeAddMinRfExplicitly(minRf, up); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, String.valueOf(docId)); doc.addField("a_t", "hello" + docId); up.add(doc); -return runAndGetAchievedRf(up, minRfExplicit, minRf); +return runAndGetAchievedRf(up, minRf); } - private int runAndGetAchievedRf(UpdateRequest up, boolean minRfExplicit, int minRf) throws SolrServerException, IOException { + private int runAndGetAchievedRf(UpdateRequest up, int minRf) throws SolrServerException, IOException { NamedList response = cloudClient.request(up); -if (minRfExplicit) { - assertMinRfInResponse(minRf, response); -} return cloudClient.getMinAchievedReplicationFactor(cloudClient.getDefaultCollection(), response); } private void assertMinRfInResponse(int minRf, NamedList response) { -Object minRfFromResponse = response.findRecursive("responseHeader", UpdateRequest.MIN_REPFACT); +Object minRfFromResponse = response.findRecursive("responseHeader"); assertNotNull("Expected min_rf header in the response", minRfFromResponse); assertEquals("Unexpected min_rf in response", ((Integer)minRfFromResponse).intValue(), minRf); } Review comment: Looks like `assertMinRfInResponse` is now also unused then, if so suggest to remove it also. ## File path: solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java ## @@ -461,38 +447,24 @@ protected void doDelete(UpdateRequest req, String msg, int expectedRf, int retri protected int sendDoc(int docId, int minRf) throws Exception { UpdateRequest up = new UpdateRequest(); -boolean minRfExplicit = maybeAddMinRfExplicitly(minRf, up); SolrInputDocument doc = new SolrInputDocument(); doc.addField(id, String.valueOf(docId)); doc.addField("a_t", "hello" + docId); up.add(doc); -return runAndGetAchievedRf(up, minRfExplicit, minRf); +return runAndGetAchievedRf(up, minRf); } - private int runAndGetAchievedRf(UpdateRequest up, boolean minRfExplicit, int minRf) throws SolrServerException, IOException { Review comment: How about also removing the `minRf` argument here since it's now no longer used? ## File path: solr/solrj/src/java/org/apache/solr/client/solrj/impl/BaseCloudSolrClient.java ## @@ -716,7 +716,6 @@ protected RouteException getRouteException(SolrException.ErrorCode serverError, if (rf == null || routeRf < rf) rf = routeRf; } - minRf = (Integer)header.get(UpdateRequest.MIN_REPFACT); Review comment: Suggest to also remove the `Integer minRf = null;` at line 696 above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] cpoerschke commented on pull request #2152: SOLR-14034: remove deprecated min_rf references
cpoerschke commented on pull request #2152: URL: https://github.com/apache/lucene-solr/pull/2152#issuecomment-749604771 At https://github.com/apache/lucene-solr/blob/releases/lucene-solr/8.7.0/solr/core/src/java/org/apache/solr/update/processor/DistributedUpdateProcessor.java#L754-L756 there's also a `Implementing min_rf here was a bit tricky. ...` comment reference to `min_rf` -- any thoughts on either removing it or rewording it somehow? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14963) Child "rows" param should apply per level
[ https://issues.apache.org/jira/browse/SOLR-14963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253585#comment-17253585 ] David Smiley commented on SOLR-14963: - It'd be great to have something for 8.8, thus gets committed within about two weeks. I'm also pushing for SOLR-14923 shortly. Let me know if you need any further guidance. > Child "rows" param should apply per level > - > > Key: SOLR-14963 > URL: https://issues.apache.org/jira/browse/SOLR-14963 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Priority: Major > Time Spent: 20m > Remaining Estimate: 0h > > The {{[child rows=10]}} doc transformer "rows" param _should_ apply per > parent, and it's documented this way: "The maximum number of child documents > to be returned per parent document.". However, it is instead implemented as > an overall limit as the child documents are processed in a depth-first order > way. The implementation ought to change. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-14088) Tika and commons-compress dependency in solr core causes classloader issue
[ https://issues.apache.org/jira/browse/SOLR-14088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden resolved SOLR-14088. - Resolution: Cannot Reproduce This hasn't come up again and I'm not planning to track it down any further right now. Marking as "Cannot Reproduce" > Tika and commons-compress dependency in solr core causes classloader issue > -- > > Key: SOLR-14088 > URL: https://issues.apache.org/jira/browse/SOLR-14088 > Project: Solr > Issue Type: Bug > Components: contrib - Solr Cell (Tika extraction) >Reporter: Kevin Risden >Priority: Major > > SOLR-14086 found that if commons-compress is in core ivy.xml as a compile > dependency, it messes up the classloader for any commons-compress > dependencies. It causes issues with items like xz being loaded. > This is problematic where dependencies shouldn't matter based on classloader. > This jira to to determine if there is something wrong w/ Solr's classloader > or if its a commons-compress issue only. > Error message from SOLR-14086 copied below: > {code:java} > > > > Error 500 java.lang.NoClassDefFoundError: Could not initialize class > org.apache.commons.compress.archivers.sevenz.Coders > > HTTP ERROR 500 java.lang.NoClassDefFoundError: Could not initialize > class org.apache.commons.compress.archivers.sevenz.Coders > > URI:/solr/tika-integration-example/update/extract > STATUS:500 > MESSAGE:java.lang.NoClassDefFoundError: Could not initialize > class org.apache.commons.compress.archivers.sevenz.Coders > SERVLET:default > CAUSED BY:java.lang.NoClassDefFoundError: Could not > initialize class org.apache.commons.compress.archivers.sevenz.Coders > > Caused by:java.lang.NoClassDefFoundError: Could not initialize > class org.apache.commons.compress.archivers.sevenz.Coders > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.readEncodedHeader(SevenZFile.java:437) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.readHeaders(SevenZFile.java:355) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile.(SevenZFile.java:241) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile. (SevenZFile.java:108) > at > org.apache.commons.compress.archivers.sevenz.SevenZFile. (SevenZFile.java:262) > at > org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:257) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280) > at > org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143) > at > org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:228) > at > org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68) > at > org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208) > at org.apache.solr.core.SolrCore.execute(SolrCore.java:2582) > at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:799) > at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:578) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:419) > at > org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:351) > at > org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1596) > at > org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:545) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) > at > org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:590) > at > org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235) > at > org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1607) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) > at > org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1297) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > at > org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:485) > at > org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1577) > at > org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) > at > org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1212) > at > org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandle
[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253596#comment-17253596 ] Kevin Risden commented on SOLR-15051: - I can't add a comment to the design doc, but wanted to address potentially misleading statements around the Solr HDFS integration. {quote}Has an unfortunate search performance penalty. TODO ___ %. Some indexing penalty too: ___ %.{quote} There will be a performance penalty here coming from remote storage. I don't think this is completely avoidable. The biggest issue is on the indexing side where we need to ensure that documents are reliably written, but this isn't exactly fast on remote storage. {quote}The implementation relies on a “BlockCache”, which means running Solr with large Java heaps.{quote} The BlockCache is off heap typically with Java direct memory so shouldn't require a large Java heap. {quote} It’s not a generalized shared storage scheme; it’s HDFS specific. It’s possible to plug in S3 and Alluxio to this but there is overhead. HDFS is rather complex to operate, whereas say S3 is provided by cloud hosting providers natively. {quote} I'm not sure I understand this statement. There are a few parts to Hadoop. HDFS is the storage layer that can be complex to operate. The more interesting part is the Hadoop filesystem interface that is a semi generic adapter between the HDFS API and other storage backends (S3, ABFS, GCS, etc). The two pieces are separate and don't require each other to operate. The Hadoop filesystem interface provides the abstraction necessary to go between local filesystem to a lot of other cloud provider storage mechanisms. There may be some overhead there, but I know there has been a lot of work in the past 1-2 years where the performance has been improved since there has been a push for cloud storage support. > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and from parent shards (i.e. from shard > splitting). The de-duplication feature requires a place to cache directory > listings so that they can be shared across replicas and atomically updated; > this is handled via ZooKeeper. Finally, some sort of Solr daemon / > auto-scaling code should be added to implement "autoAddReplicas", especially > to provide for a scenario where the leader is gone and can't be replicated > from directly but we can access shared storage. > For more about shared storage concepts, consider looking at the description > in SOLR-13101 and the linked Google Doc. > *[PROPOSAL > DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14688) First party package implementation design
[ https://issues.apache.org/jira/browse/SOLR-14688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253598#comment-17253598 ] Jan Høydahl commented on SOLR-14688: So, how can we achieve these to goals? # Support locally/statically installed packages on each node, avoid needing access to a remote repo at runtime in prod, or access to ZK during image build? # Have each Solr node independently decide which version of a package to choose based on its compatibility spec, to support rolling upgrade e.g 8.7 -> 8.8? > First party package implementation design > - > > Key: SOLR-14688 > URL: https://issues.apache.org/jira/browse/SOLR-14688 > Project: Solr > Issue Type: Improvement >Reporter: Noble Paul >Priority: Major > Labels: package, packagemanager > > Here's the design document for first party packages: > https://docs.google.com/document/d/1n7gB2JAdZhlJKFrCd4Txcw4HDkdk7hlULyAZBS-wXrE/edit?usp=sharing > Put differently, this is about package-ifying our "contribs". -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor
[ https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Marty updated LUCENE-9646: -- Summary: Set BM25Similarity discountOverlaps via the constructor (was: Set BM25Similarity discountOverlaps through the constructor) > Set BM25Similarity discountOverlaps via the constructor > --- > > Key: LUCENE-9646 > URL: https://issues.apache.org/jira/browse/LUCENE-9646 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: master (9.0) >Reporter: Patrick Marty >Priority: Trivial > > BM25Similarity discountOverlaps parameter is true by default. > It can be set with > {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} > method. > But this method makes BM25Similarity mutable. > > discountOverlaps should be set via the constructor and > {{setDiscountOverlaps}} method should be removed to make BM25Similarity > immutable. > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states
[ https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253612#comment-17253612 ] Ilan Ginzburg commented on SOLR-15052: -- I agree [~noble.paul]. There's no serialization, I meant to say the list of znode children is basically read for each replica update. Say we get a sequence of updates for {{R1, R2, R3, R4, R5}}. Assuming {{R1}} and {{R2}} arrive at the same time, they can be executed at the same time as per your example (assuming the {{DocCollection getPerReplicaStates()}} is up to date). If slightly later {{R3}} and {{R4}} arrive together, each is going to see a changed {{cversion}} if they haven't re-read the directory listing after the {{R1}} & {{R2}} update. Each is going to re-read the directory before executing the update (done in [PerReplicaStates.fetch L166|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R166] called from [ZkController.publish L1622|https://github.com/apache/lucene-solr/pull/2148/files#diff-5b63503605ede4384429e74d1fa0c410adc5da8f3246e8c36e49feff2f3ea692R1622] before the call to [PerReplicaStates.persist L107|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R107] doing the actual [multi (L136)|https://github.com/apache/lucene-solr/pull/2148/files#diff-0bd8a828302915c525c8df3e8cccdc9881ebad121359c0dbc8374b8b72995669R136] operation). Then the {{R5}} update is also going to read the directory listing and execute. Basically, unless the {{PerReplicaStates}} stored in {{DocCollection}} is up to date for other reasons and new update requests arrive at exactly the same time, then each new replica update request triggers a new read of the directory listing. Updates are not serialized ({{R3}} and {{R4}} can execute in parallel), but there's some inefficiency in the way they're handled. I wanted to see the actual impact of this. Based on [~ichattopadhyaya]'s test [StateListVsCASSpinlock.java|https://raw.githubusercontent.com/chatman/experiments/main/src/main/java/StateListVsCASSpinlock.java] I tried to get an idea of the costs of the different actions. With 500 children znodes, {{getChildren}} took on my laptop about 10-15ms while {{getData}} on a single file with equivalent amount of text took longer at ~20ms. This came as a surprise to me. The multi operation (delete znode, create znode) took about 40ms while the CAS of the text file was faster at 30ms, but there were many retries in CAS as expected that considerably slowed down the process (got a speedup of over 10x by using the independent znodes vs a single text file with CAS with 500 replicas). The implementation in the PR could easily avoid systematically re-reading the znode children list by attempting the multi operation on the cached {{PerReplicaStates}} of the {{DocCollection}} (if not {{null}}). Only if the multi fails should it re-read the directory listing and try again. Maybe not worth it at this point though (but something to keep in mind). > Reducing overseer bottlenecks using per-replica states > -- > > Key: SOLR-15052 > URL: https://issues.apache.org/jira/browse/SOLR-15052 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: per-replica-states-gcp.pdf > > Time Spent: 3h 10m > Remaining Estimate: 0h > > This work has the same goal as SOLR-13951, that is to reduce overseer > bottlenecks by avoiding replica state updates from going to the state.json > via the overseer. However, the approach taken here is different from > SOLR-13951 and hence this work supercedes that work. > The design proposed is here: > https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit > Briefly, > # Every replica's state will be in a separate znode nested under the > state.json. It has the name that encodes the replica name, state, leadership > status. > # An additional children watcher to be set on state.json for state changes. > # Upon a state change, a ZK multi-op to delete the previous znode and add a > new znode with new state. > Differences between this and SOLR-13951, > # In SOLR-13951, we planned to leverage shard terms for per shard states. > # As a consequence, the code changes required for SOLR-13951 were massive (we > needed a shard state provider abstraction and introduce it everywhere in the > codebase). > # This approach is a drastically simpler change and design. > Credits for this design and the PR is due to [~noble.paul]. > [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this > effort. The reference
[GitHub] [lucene-solr] patrickmarty opened a new pull request #2161: LUCENE-9646: Set BM25Similarity discountOverlaps via the constructor
patrickmarty opened a new pull request #2161: URL: https://github.com/apache/lucene-solr/pull/2161 # Description BM25Similarity discountOverlaps parameter can be set with org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps method. But this method makes BM25Similarity mutable. # Solution discountOverlaps should be set via the constructor and setDiscountOverlaps method should be removed to make BM25Similarity immutable. # Tests all tests involving BM25Similarity and LegacyBM25Similarity have been updated # Checklist Please review the following and check all that apply: - [ ] I have reviewed the guidelines for [How to Contribute](https://wiki.apache.org/solr/HowToContribute) and my code conforms to the standards described there to the best of my ability. - [ ] I have created a Jira issue and added the issue ID to my pull request title. - [ ] I have given Solr maintainers [access](https://help.github.com/en/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork) to contribute to my PR branch. (optional but recommended) - [ ] I have developed this patch against the `master` branch. - [ ] I have run `./gradlew check`. - [ ] I have added tests for my changes. - [ ] I have added documentation for the [Ref Guide](https://github.com/apache/lucene-solr/tree/master/solr/solr-ref-guide) (for Solr changes only). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (LUCENE-9646) Set BM25Similarity discountOverlaps via the constructor
[ https://issues.apache.org/jira/browse/LUCENE-9646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Marty updated LUCENE-9646: -- Description: BM25Similarity discountOverlaps parameter is true by default. It can be set with {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} method. But this method makes BM25Similarity mutable. discountOverlaps should be set via the constructor and {{setDiscountOverlaps}} method should be removed to make BM25Similarity immutable. PR https://github.com/apache/lucene-solr/pull/2161 was: BM25Similarity discountOverlaps parameter is true by default. It can be set with {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} method. But this method makes BM25Similarity mutable. discountOverlaps should be set via the constructor and {{setDiscountOverlaps}} method should be removed to make BM25Similarity immutable. > Set BM25Similarity discountOverlaps via the constructor > --- > > Key: LUCENE-9646 > URL: https://issues.apache.org/jira/browse/LUCENE-9646 > Project: Lucene - Core > Issue Type: Improvement > Components: core/search >Affects Versions: master (9.0) >Reporter: Patrick Marty >Priority: Trivial > Time Spent: 10m > Remaining Estimate: 0h > > BM25Similarity discountOverlaps parameter is true by default. > It can be set with > {{org.apache.lucene.search.similarities.BM25Similarity#setDiscountOverlaps}} > method. > But this method makes BM25Similarity mutable. > > discountOverlaps should be set via the constructor and > {{setDiscountOverlaps}} method should be removed to make BM25Similarity > immutable. > > PR https://github.com/apache/lucene-solr/pull/2161 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk closed pull request #2008: SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0
risdenk closed pull request #2008: URL: https://github.com/apache/lucene-solr/pull/2008 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0
[ https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253640#comment-17253640 ] ASF subversion and git services commented on SOLR-14951: Commit f0b73fdc6d8d10653a2239de3071a524310f84e2 in lucene-solr's branch refs/heads/master from Kevin Risden [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=f0b73fd ] SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0 Closes PR #2008 > Upgrade Angular JS 1.7.9 to 1.8.0 > - > > Key: SOLR-14951 > URL: https://issues.apache.org/jira/browse/SOLR-14951 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Fix For: 8.8 > > Time Spent: 10m > Remaining Estimate: 0h > > Angular JS released 1.8.0 to fix some security vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0
[ https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253670#comment-17253670 ] ASF subversion and git services commented on SOLR-14951: Commit 9e279fda14f937d7bac9721a7a5ee8d1e41f7050 in lucene-solr's branch refs/heads/branch_8x from Kevin Risden [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9e279fd ] SOLR-14951: Upgrade Angular JS 1.7.9 to 1.8.0 Closes PR #2008 > Upgrade Angular JS 1.7.9 to 1.8.0 > - > > Key: SOLR-14951 > URL: https://issues.apache.org/jira/browse/SOLR-14951 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Fix For: 8.8 > > Time Spent: 20m > Remaining Estimate: 0h > > Angular JS released 1.8.0 to fix some security vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-14951) Upgrade Angular JS 1.7.9 to 1.8.0
[ https://issues.apache.org/jira/browse/SOLR-14951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Risden updated SOLR-14951: Resolution: Fixed Status: Resolved (was: Patch Available) > Upgrade Angular JS 1.7.9 to 1.8.0 > - > > Key: SOLR-14951 > URL: https://issues.apache.org/jira/browse/SOLR-14951 > Project: Solr > Issue Type: Task > Security Level: Public(Default Security Level. Issues are Public) > Components: Admin UI >Reporter: Kevin Risden >Assignee: Kevin Risden >Priority: Major > Fix For: 8.8 > > Time Spent: 20m > Remaining Estimate: 0h > > Angular JS released 1.8.0 to fix some security vulnerabilities. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253696#comment-17253696 ] David Smiley commented on SOLR-15051: - Thanks for the look [~krisden]! bq. There will be a performance penalty here coming from remote storage. For read performance, if cached locally (BlobDirectory does this), there isn't. BlobDirectory simply delegates reads to {{MMapDirectory}}. I've seen [~tpot]'s presentation showing that {{HdfsDirectory}} has a read-time performance hit. bq. The BlockCache is off heap typically with Java direct memory so shouldn't require a large Java heap. Oops; thanks for the correction! RE storage APIs/abstractions: Are you claiming that the "Hadoop filesystem interface" is an ideal choice for a BlobDirectory backing store abstraction? BlobDirectory or whatever shared system needs to write to *something*, so I'm sincere in asking for your opinion on what that something should be. I have basically no HDFS experience so I was unaware of it's generic API. Even if that interface is nice... I suspect BlobDirectory ought to have some simple abstraction any way but I'm really not hung up on this choice. I'm leery of adding heavy to too many dependencies. > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and from parent shards (i.e. from shard > splitting). The de-duplication feature requires a place to cache directory > listings so that they can be shared across replicas and atomically updated; > this is handled via ZooKeeper. Finally, some sort of Solr daemon / > auto-scaling code should be added to implement "autoAddReplicas", especially > to provide for a scenario where the leader is gone and can't be replicated > from directly but we can access shared storage. > For more about shared storage concepts, consider looking at the description > in SOLR-13101 and the linked Google Doc. > *[PROPOSAL > DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (LUCENE-9570) Review code diffs after automatic formatting and correct problems before it is applied
[ https://issues.apache.org/jira/browse/LUCENE-9570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253719#comment-17253719 ] Dawid Weiss commented on LUCENE-9570: - Very long string concatenations (and arithmetic expressions) get broken over several lines. {code} if (VERBOSE_DELETES) { - return "gen=" + gen + " numTerms=" + numTermDeletes + ", deleteTerms=" + deleteTerms -+ ", deleteQueries=" + deleteQueries + ", fieldUpdates=" + fieldUpdates -+ ", bytesUsed=" + bytesUsed; + return "gen=" + + gen + + " numTerms=" + + numTermDeletes + + ", deleteTerms=" + + deleteTerms + + ", deleteQueries=" + + deleteQueries + + ", fieldUpdates=" + + fieldUpdates + + ", bytesUsed=" + + bytesUsed; {code} If it's semantically important, this can be manually grouped using parentheses, then the formatted code will preserve groups: {code} if (VERBOSE_DELETES) { return ("gen=" + gen) + (" numTerms=" + numTermDeletes) + (", deleteTerms=" + deleteTerms) + (", deleteQueries=" + deleteQueries) + (", fieldUpdates=" + fieldUpdates) + (", bytesUsed=" + bytesUsed); {code} > Review code diffs after automatic formatting and correct problems before it > is applied > -- > > Key: LUCENE-9570 > URL: https://issues.apache.org/jira/browse/LUCENE-9570 > Project: Lucene - Core > Issue Type: Sub-task >Reporter: Dawid Weiss >Assignee: Dawid Weiss >Priority: Blocker > Time Spent: 10m > Remaining Estimate: 0h > > Review and correct all the javadocs before they're messed up by automatic > formatting. Apply project-by-project, review diff, correct. Lots of diffs but > it should be relatively quick. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-14923) Indexing performance is unacceptable when child documents are involved
[ https://issues.apache.org/jira/browse/SOLR-14923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253756#comment-17253756 ] David Smiley commented on SOLR-14923: - I pushed a branch_8x back-port to my fork which only needed a couple trivial changes in NestedShardedAtomicUpdateTest: https://github.com/dsmiley/lucene-solr/commit/0f08442c7af7f171ffcb36434bd07552303dd88f Please do run a performance test! > Indexing performance is unacceptable when child documents are involved > -- > > Key: SOLR-14923 > URL: https://issues.apache.org/jira/browse/SOLR-14923 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Components: update, UpdateRequestProcessors >Affects Versions: 8.3, 8.4, 8.5, 8.6, 8.7, master (9.0) >Reporter: Thomas Wöckinger >Priority: Critical > Labels: performance, pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Parallel indexing does not make sense at moment when child documents are used. > The org.apache.solr.update.processor.DistributedUpdateProcessor checks at the > end of the method doVersionAdd if Ulog caches should be refreshed. > This check will return true if any child document is included in the > AddUpdateCommand. > If so ulog.openRealtimeSearcher(); is called, this call is very expensive, > and executed in a synchronized block of the UpdateLog instance, therefore all > other operations on the UpdateLog are blocked too. > Because every important UpdateLog method (add, delete, ...) is done using a > synchronized block almost each operation is blocked. > This reduces multi threaded index update to a single thread behavior. > The described behavior is not depending on any option of the UpdateRequest, > so it does not make any difference if 'waitFlush', 'waitSearcher' or > 'softCommit' is true or false. > The described behavior makes the usage of ChildDocuments useless, because the > performance is unacceptable. > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob closed pull request #2118: SOLR-15031: Prevent null being wrapped in a QueryValueSource
madrob closed pull request #2118: URL: https://github.com/apache/lucene-solr/pull/2118 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource
[ https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253779#comment-17253779 ] ASF subversion and git services commented on SOLR-15031: Commit 9d19a5893621766be6ffd0002ef0997da6847aa5 in lucene-solr's branch refs/heads/branch_8x from Pieter van Boxtel [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=9d19a58 ] SOLR-15031 Prevent null being wrapped in a QueryValueSource closes #2118 > NPE caused by FunctionQParser returning a null ValueSource > -- > > Key: SOLR-15031 > URL: https://issues.apache.org/jira/browse/SOLR-15031 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pieter >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > When parsing a sub query in a function query, > {{FunctionQParser#parseValueSource}} does not check if the produced query > object is null. When it is, it just wraps a null in a {{QueryValueSource}} > object. This is a cause for NPE's in code consuming that object. Parsed > queries can be null, for example when the query string only contains > stopwords, so we need handle that condition. > h3. Steps to reproduce the issue > # Start solr with the techproducts example collection: {{solr start -e > techproducts}} > # Add a stopword to > SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for > example "at" > # Reload the core > # Execute a function query: > {code:java} > http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code} > The following stacktrace is produced: > {code:java} > 2020-12-03 13:35:38.868 INFO (qtp2095677157-21) [ x:techproducts] > o.a.s.c.S.Request [techproducts] webapp=/solr path=/select > params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}} > status=500 QTime=34 > 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [ x:techproducts] > o.a.s.s.HttpSolrCall null:java.lang.NullPointerException > at > org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63) > at > org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129) > at > org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176) > at > org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource
[ https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253780#comment-17253780 ] ASF subversion and git services commented on SOLR-15031: Commit 98f12f4aeb9f6ec9f5c4de53f9faddc41043df59 in lucene-solr's branch refs/heads/master from Pieter van Boxtel [ https://gitbox.apache.org/repos/asf?p=lucene-solr.git;h=98f12f4 ] SOLR-15031 Prevent null being wrapped in a QueryValueSource closes #2118 > NPE caused by FunctionQParser returning a null ValueSource > -- > > Key: SOLR-15031 > URL: https://issues.apache.org/jira/browse/SOLR-15031 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pieter >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > When parsing a sub query in a function query, > {{FunctionQParser#parseValueSource}} does not check if the produced query > object is null. When it is, it just wraps a null in a {{QueryValueSource}} > object. This is a cause for NPE's in code consuming that object. Parsed > queries can be null, for example when the query string only contains > stopwords, so we need handle that condition. > h3. Steps to reproduce the issue > # Start solr with the techproducts example collection: {{solr start -e > techproducts}} > # Add a stopword to > SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for > example "at" > # Reload the core > # Execute a function query: > {code:java} > http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code} > The following stacktrace is produced: > {code:java} > 2020-12-03 13:35:38.868 INFO (qtp2095677157-21) [ x:techproducts] > o.a.s.c.S.Request [techproducts] webapp=/solr path=/select > params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}} > status=500 QTime=34 > 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [ x:techproducts] > o.a.s.s.HttpSolrCall null:java.lang.NullPointerException > at > org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63) > at > org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129) > at > org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176) > at > org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Resolved] (SOLR-15031) NPE caused by FunctionQParser returning a null ValueSource
[ https://issues.apache.org/jira/browse/SOLR-15031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Drob resolved SOLR-15031. -- Fix Version/s: master (9.0) 8.8 Assignee: Mike Drob Resolution: Fixed Merged this in, thanks for the contribution! > NPE caused by FunctionQParser returning a null ValueSource > -- > > Key: SOLR-15031 > URL: https://issues.apache.org/jira/browse/SOLR-15031 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Pieter >Assignee: Mike Drob >Priority: Minor > Fix For: 8.8, master (9.0) > > Time Spent: 2h 10m > Remaining Estimate: 0h > > When parsing a sub query in a function query, > {{FunctionQParser#parseValueSource}} does not check if the produced query > object is null. When it is, it just wraps a null in a {{QueryValueSource}} > object. This is a cause for NPE's in code consuming that object. Parsed > queries can be null, for example when the query string only contains > stopwords, so we need handle that condition. > h3. Steps to reproduce the issue > # Start solr with the techproducts example collection: {{solr start -e > techproducts}} > # Add a stopword to > SOLR_DIR/example/techproducts/solr/techproducts/conf/stopwords.txt, for > example "at" > # Reload the core > # Execute a function query: > {code:java} > http://localhost:8983/solr/techproducts/select?fieldquery={!field%20f=features%20v=%27%22at%22%27}&q={!func}%20if($fieldquery,1,0){code} > The following stacktrace is produced: > {code:java} > 2020-12-03 13:35:38.868 INFO (qtp2095677157-21) [ x:techproducts] > o.a.s.c.S.Request [techproducts] webapp=/solr path=/select > params={q={!func}+if($fieldquery,1,0)&fieldquery={!field+f%3Dfeatures+v%3D'"at"'}} > status=500 QTime=34 > 2020-12-03 13:35:38.872 ERROR (qtp2095677157-21) [ x:techproducts] > o.a.s.s.HttpSolrCall null:java.lang.NullPointerException > at > org.apache.lucene.queries.function.valuesource.QueryValueSource.hashCode(QueryValueSource.java:63) > at > org.apache.lucene.queries.function.valuesource.IfFunction.hashCode(IfFunction.java:129) > at > org.apache.lucene.queries.function.FunctionQuery.hashCode(FunctionQuery.java:176) > at > org.apache.solr.search.QueryResultKey.(QueryResultKey.java:53) > at > org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1341) > at > org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:580) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Created] (LUCENE-9647) Add back ant precommit on PR for branch 8
Mike Drob created LUCENE-9647: - Summary: Add back ant precommit on PR for branch 8 Key: LUCENE-9647 URL: https://issues.apache.org/jira/browse/LUCENE-9647 Project: Lucene - Core Issue Type: Task Components: general/build Reporter: Mike Drob When migrating everything to gradle only, we accidentally deleted our branch_8x PR precommit action. The file needs to be on master, but the branch specification should specify branch_8x I believe. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] dsmiley opened a new pull request #2162: SOLR-15051 Blob, DRAFT WIP
dsmiley opened a new pull request #2162: URL: https://github.com/apache/lucene-solr/pull/2162 https://issues.apache.org/jira/browse/SOLR-15051 Remember this is very WIP... just getting started here. CC @bruno-roustant @NazerkeBS @atris This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15051) Shared storage -- BlobDirectory (de-duping)
[ https://issues.apache.org/jira/browse/SOLR-15051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253791#comment-17253791 ] David Smiley commented on SOLR-15051: - In Solr we have a "BackupRepository" abstraction already for backup/restore. I rather like it; it's somewhat generic with a bias towards working with Lucene. It has "Backup" in it's name but it might as well be called something like "RemoteStorageRepository" that is not just usable for Backup/restore but also as a backing abstraction for shared storage generally. Interestingly, I see it uses the Lucene {{IndexInput}} for reading (instead of an InputStream), and supports copying via Lucene's {{Directory}} abstraction as well, CC [~gerlowskija] [~atris] [~varun] [~hgadre] > Shared storage -- BlobDirectory (de-duping) > --- > > Key: SOLR-15051 > URL: https://issues.apache.org/jira/browse/SOLR-15051 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: David Smiley >Assignee: David Smiley >Priority: Major > Time Spent: 10m > Remaining Estimate: 0h > > This proposal is a way to accomplish shared storage in SolrCloud with a few > key characteristics: (A) using a Directory implementation, (B) delegates to a > backing local file Directory as a kind of read/write cache (C) replicas have > their own "space", (D) , de-duplication across replicas via reference > counting, (E) uses ZK but separately from SolrCloud stuff. > The Directory abstraction is a good one, and helps isolate shared storage > from the rest of SolrCloud that doesn't care. Using a backing normal file > Directory is faster for reads and is simpler than Solr's HDFSDirectory's > BlockCache. Replicas having their own space solves the problem of multiple > writers (e.g. of the same shard) trying to own and write to the same space, > and it implies that any of Solr's replica types can be used along with what > goes along with them like peer-to-peer replication (sometimes faster/cheaper > than pulling from shared storage). A de-duplication feature solves needless > duplication of files across replicas and from parent shards (i.e. from shard > splitting). The de-duplication feature requires a place to cache directory > listings so that they can be shared across replicas and atomically updated; > this is handled via ZooKeeper. Finally, some sort of Solr daemon / > auto-scaling code should be added to implement "autoAddReplicas", especially > to provide for a scenario where the leader is gone and can't be replicated > from directly but we can access shared storage. > For more about shared storage concepts, consider looking at the description > in SOLR-13101 and the linked Google Doc. > *[PROPOSAL > DOC|https://docs.google.com/document/d/1kjQPK80sLiZJyRjek_Edhokfc5q9S3ISvFRM2_YeL8M/edit?usp=sharing]* -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob opened a new pull request #2163: LUCENE-9647 Add back github action for Ant
madrob opened a new pull request #2163: URL: https://github.com/apache/lucene-solr/pull/2163 https://issues.apache.org/jira/browse/LUCENE-9647 Tagging @tflobbe for review because you removed the file in the first place. If there's a better place to put it, let me know. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks
madrob commented on a change in pull request #2148: URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547542479 ## File path: solr/test-framework/src/java/org/apache/solr/cloud/SolrCloudTestCase.java ## @@ -81,6 +81,7 @@ public class SolrCloudTestCase extends SolrTestCaseJ4 { private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + public static final Boolean USE_PER_REPLICA_STATE = Boolean.parseBoolean(System.getProperty("use.per-replica", "false")); Review comment: Can it be a setting in cluster properties that controls how new collections are created? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks
madrob commented on a change in pull request #2148: URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547542705 ## File path: solr/core/src/java/org/apache/solr/cloud/ZkController.java ## @@ -1609,12 +1611,41 @@ public void publish(final CoreDescriptor cd, final Replica.State state, boolean if (updateLastState) { cd.getCloudDescriptor().setLastPublished(state); } - overseerJobQueue.offer(Utils.toJSON(m)); + DocCollection coll = zkStateReader.getCollection(collection); + if (forcePublish || sendToOverseer(coll, coreNodeName)) { Review comment: some aspects of state stored in the old place, some aspects stored in the new place. I'm still working on building a full mental model, so maybe this is a wrong question. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] madrob commented on a change in pull request #2148: SOLR-15052: Per-replica states for reducing overseer bottlenecks
madrob commented on a change in pull request #2148: URL: https://github.com/apache/lucene-solr/pull/2148#discussion_r547543008 ## File path: solr/solrj/src/java/org/apache/solr/common/cloud/PerReplicaStates.java ## @@ -0,0 +1,587 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.solr.common.cloud; + +import java.io.IOException; +import java.lang.invoke.MethodHandles; +import java.util.ArrayList; +import java.util.Collections; +import java.util.HashSet; +import java.util.LinkedHashMap; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import java.util.Set; +import java.util.function.BiConsumer; + +import org.apache.solr.cluster.api.SimpleMap; +import org.apache.solr.common.MapWriter; +import org.apache.solr.common.SolrException; +import org.apache.solr.common.annotation.JsonProperty; +import org.apache.solr.common.util.ReflectMapWriter; +import org.apache.solr.common.util.StrUtils; +import org.apache.solr.common.util.WrappedSimpleMap; +import org.apache.zookeeper.CreateMode; +import org.apache.zookeeper.KeeperException; +import org.apache.zookeeper.data.ACL; +import org.apache.zookeeper.data.Stat; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import static java.util.Collections.singletonList; +import static org.apache.solr.common.params.CommonParams.NAME; +import static org.apache.solr.common.params.CommonParams.VERSION; + +/** + * This represents the individual replica states in a collection + * This is an immutable object. When states are modified, a new instance is constructed + */ +public class PerReplicaStates implements ReflectMapWriter { + private static final Logger log = LoggerFactory.getLogger(MethodHandles.lookup().lookupClass()); + public static final char SEPARATOR = ':'; + + + @JsonProperty + public final String path; + + @JsonProperty + public final int cversion; + + @JsonProperty + public final SimpleMap states; + + public PerReplicaStates(String path, int cversion, List states) { +this.path = path; +this.cversion = cversion; +Map tmp = new LinkedHashMap<>(); + +for (String state : states) { + State rs = State.parse(state); + if (rs == null) continue; + State existing = tmp.get(rs.replica); + if (existing == null) { +tmp.put(rs.replica, rs); + } else { +tmp.put(rs.replica, rs.insert(existing)); + } +} +this.states = new WrappedSimpleMap<>(tmp); + + } + + /**Get the changed replicas + */ + public static Set findModifiedReplicas(PerReplicaStates old, PerReplicaStates fresh) { +Set result = new HashSet<>(); +if (fresh == null) { + old.states.forEachKey(result::add); + return result; +} +old.states.forEachEntry((s, state) -> { + // the state is modified or missing + if (!Objects.equals(fresh.get(s) , state)) result.add(s); +}); +fresh.states.forEachEntry((s, state) -> { if (old.get(s) == null ) result.add(s); +}); +return result; + } + + /** + * This is a persist operation with retry if a write fails due to stale state + */ + public static void persist(WriteOps ops, String znode, SolrZkClient zkClient) throws KeeperException, InterruptedException { +try { + persist(ops.get(), znode, zkClient); +} catch (KeeperException.NodeExistsException | KeeperException.NoNodeException e) { + //state is stale + log.info("stale state for {} . retrying...", znode); + List freshOps = ops.get(PerReplicaStates.fetch(znode, zkClient, null)); + persist(freshOps, znode, zkClient); + log.info("retried for stale state {}, succeeded", znode); +} + } + + /** + * Persist a set of operations to Zookeeper + */ + public static void persist(List operations, String znode, SolrZkClient zkClient) throws KeeperException, InterruptedException { +if (operations == null || operations.isEmpty()) return; +log.debug("Per-replica state being persisted for :{}, ops: {}", znode, operations); + +List ops = new ArrayList<>(operations.size()); +for (Op op : operations) { + //the state of the replica is being updated + String path = znode + "/" + op.state.asString; + List acl
[GitHub] [lucene-solr] trdillon commented on a change in pull request #2152: SOLR-14034: remove deprecated min_rf references
trdillon commented on a change in pull request #2152: URL: https://github.com/apache/lucene-solr/pull/2152#discussion_r547561917 ## File path: solr/core/src/test/org/apache/solr/cloud/HttpPartitionTest.java ## @@ -548,9 +548,6 @@ protected int sendDoc(int docId, Integer minRf, SolrClient solrClient, String co doc.addField("a_t", "hello" + docId); UpdateRequest up = new UpdateRequest(); -if (minRf != null) { - up.setParam(UpdateRequest.MIN_REPFACT, String.valueOf(minRf)); -} Review comment: @cpoerschke Thanks for all your help with this one. `sendDoc` seems OK to remove from `HttpPartitionTest` as it's only passed `minRf` as an argument, but in `ReplicationFactorTest` it is passed `expectedRf` as an argument in `addDocs`: https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L417 `sendDocsWithRetry` is implemented here and used in a few different test classes (`DistributedVersionInfoTest`, `LeaderFailoverAfterPartitionTest` and `ForceLeaderTest`) with `minRf` as an argument: https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/test-framework/src/java/org/apache/solr/cloud/AbstractFullDistribZkTestBase.java#L941 But it's also implemented using `expectedRfDBQ` and `expectedRf` in `ReplicationFactorTest`: - https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L230 - https://github.com/apache/lucene-solr/blob/98f12f4aeb9f6ec9f5c4de53f9faddc41043df59/solr/core/src/test/org/apache/solr/cloud/ReplicationFactorTest.java#L427 I wasn't sure how to proceed with this as looking at the tests it seems that `expectedRf` is relied upon quite often. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Updated] (SOLR-15052) Reducing overseer bottlenecks using per-replica states
[ https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Noble Paul updated SOLR-15052: -- Issue Type: Improvement (was: Bug) > Reducing overseer bottlenecks using per-replica states > -- > > Key: SOLR-15052 > URL: https://issues.apache.org/jira/browse/SOLR-15052 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: per-replica-states-gcp.pdf > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This work has the same goal as SOLR-13951, that is to reduce overseer > bottlenecks by avoiding replica state updates from going to the state.json > via the overseer. However, the approach taken here is different from > SOLR-13951 and hence this work supercedes that work. > The design proposed is here: > https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit > Briefly, > # Every replica's state will be in a separate znode nested under the > state.json. It has the name that encodes the replica name, state, leadership > status. > # An additional children watcher to be set on state.json for state changes. > # Upon a state change, a ZK multi-op to delete the previous znode and add a > new znode with new state. > Differences between this and SOLR-13951, > # In SOLR-13951, we planned to leverage shard terms for per shard states. > # As a consequence, the code changes required for SOLR-13951 were massive (we > needed a shard state provider abstraction and introduce it everywhere in the > codebase). > # This approach is a drastically simpler change and design. > Credits for this design and the PR is due to [~noble.paul]. > [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this > effort. The reference branch takes a conceptually similar (but not identical) > approach. > I shall attach a PR and performance benchmarks shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[jira] [Commented] (SOLR-15052) Reducing overseer bottlenecks using per-replica states
[ https://issues.apache.org/jira/browse/SOLR-15052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17253820#comment-17253820 ] Noble Paul commented on SOLR-15052: --- {quote}Then the {{R5}} update is also going to read the directory listing and execute. {quote} R5 would have gotten a callback and it would've updated the per-replica-states anyway. So, all that we are doing is an extra {{stat}} read , which is extremely cheap. {quote}With 500 children znodes, getChildren took on my laptop about 10-15ms while getData on a single file with equivalent amount of text took longer at ~20ms. This came as a surprise to me. {quote} Reads are not such a big deal. Even writes are not a big deal. But, CAS writes are a big deal. We would like to minimize contention while doing CAS writes. {quote}The multi operation (delete znode, create znode) took about 40ms while the CAS of the text file was faster at 30ms, {quote} CAS in itself is not slow. As the no:of of parallel writes grow, the performance degrades dramatically. If you have 1000's of replicas trying to update using CAS, the performance is going to be unacceptably low. Where as, the {{multi}} approach on individual nodes will perform same irrespective of whether we have 2 replicas or 2 replicas. {quote}The implementation in the PR could easily avoid systematically re-reading the znode children list by attempting the multi operation on the cached PerReplicaStates of the DocCollection {quote} It already uses the cached data. Yes, it does an extra version check, but that's cheap > Reducing overseer bottlenecks using per-replica states > -- > > Key: SOLR-15052 > URL: https://issues.apache.org/jira/browse/SOLR-15052 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) >Reporter: Ishan Chattopadhyaya >Priority: Major > Attachments: per-replica-states-gcp.pdf > > Time Spent: 3h 40m > Remaining Estimate: 0h > > This work has the same goal as SOLR-13951, that is to reduce overseer > bottlenecks by avoiding replica state updates from going to the state.json > via the overseer. However, the approach taken here is different from > SOLR-13951 and hence this work supercedes that work. > The design proposed is here: > https://docs.google.com/document/d/1xdxpzUNmTZbk0vTMZqfen9R3ArdHokLITdiISBxCFUg/edit > Briefly, > # Every replica's state will be in a separate znode nested under the > state.json. It has the name that encodes the replica name, state, leadership > status. > # An additional children watcher to be set on state.json for state changes. > # Upon a state change, a ZK multi-op to delete the previous znode and add a > new znode with new state. > Differences between this and SOLR-13951, > # In SOLR-13951, we planned to leverage shard terms for per shard states. > # As a consequence, the code changes required for SOLR-13951 were massive (we > needed a shard state provider abstraction and introduce it everywhere in the > codebase). > # This approach is a drastically simpler change and design. > Credits for this design and the PR is due to [~noble.paul]. > [~markrmil...@gmail.com], [~noble.paul] and I have collaborated on this > effort. The reference branch takes a conceptually similar (but not identical) > approach. > I shall attach a PR and performance benchmarks shortly. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] tflobbe commented on pull request #2163: LUCENE-9647 Add back github action for Ant
tflobbe commented on pull request #2163: URL: https://github.com/apache/lucene-solr/pull/2163#issuecomment-749862991 Does this go yo master or to 8.x branch? I don't think we had it running in 8.x before? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org